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The present series of experiments compares 
the performance of several man-machine 
tracking systems when Ss are controlling 
under “normal unstressed” conditions and 
when controlling under conditions of “task- 
induced stress” (Lazarus, Deese, & Osler, 
1952). The experiments seek to determine 
if the relative efficiencies of two or more 
systems are the same when they are operated 
under “normal” conditions as when they are 
operated under conditions of task-induced 
stress. A brief description of the three experi- 
ments follows: 


Experiment I 


An acceleration control and an acceieration- 
aided control system are compared in the 
absence of stress and under a variety of stress 
conditions. 

Experiment II 


The same two systems as those used in Ex- 
periment I are used; however, Ss are selected 
and divided into two groups so that the 
poorer Ss control the acceleration-aided sys- 
tem and the better Ss control the acceleration 
control system. The purpose is to determine 
if stressing the human element will produce 
differential deterioration in the performances 
of two systems which have been equated 
through selection of Ss. 


Experiment III 


Since prolonged training of Ss on the two 
systems used in Experiments I and II indi- 


1The authors wish to thank Jean B. Henson for 
her assistance in running Ss and analyzing the data. 


INTERACTIONS AMONG OPERATOR VARIABLES, 
SYSTEM DYNAMICS, AND TASK-INDUCED 
STRESS * 


W. D. GARVEY anp F. V. TAYLOR 
Naval Research Laboratory 


cated that the performance of these would 
probably never become equated through train- 
ing, this experiment employs the acceleration 
control system used in the previous experi- 
ments and a position control system. . The 
experiment seeks to determine the differential 
effects of stress on the performance of two 
systems which have been equated through 
extensive training of Ss. 


Method 


Apparatus and procedure. 
conducted with the three man-machine systems 
shown in Fig. 1. In each system, S serves as a 
tracking element and it is his task to keep a dot 
of a CRT centered on a stationary hairline by 
manipulating a joy stick control. Tracking is in 
only one dimension since the dot and joy stick are 
free to move only in the horizontal plane. The dot 
is forced off the hairline by a complex sine wave 
input (system input) which consists of two basic 
frequencies of 2.3 and 3.2 cycles per min.; the rela- 
tive amplitude of each sine wave is inversely pro- 
portional to its frequency. 

In the Acceleration Control System in Fig. 1, a 
1° displacement of the stick produces an 18 mm./ 
sec.2 acceleration of the dot on the scope. The 
Acceleration-aided Control System is equivalent to 
the Acceleration Control System with the exception 
that the response delays, due to the integrators in 
the system, have been eliminated from the display 
of the error to S. A stick movement of 1° imparts 
a displacement of 4.5 mm., a velocity of 9.0 mm./ 
sec. and an acceleration of 18.0 mm./sec.? to the dot 
on the scope. In the Position Control System, a 1° 
stick displacement produces a 4.5 mm. change in the 
position of the dot on the scope. The performance 
of each system was measured in terms of absolute 
system error, integrated over the last 50 sec. of a 
60-sec. trial. 

Forty-eight naval enlisted men served as Ss in the 
three experiments—16 Ss per experiment. Before 
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Fic. 1. Simplified block diagrams of three manual tracking systems employed. 


The symbol 


©) stands for algebraic addition, 


stands for integrator, and 


> stands for amplifier. 


each experiment all Ss who were to be used were 
given 20 1-min. trials on each of the two systems 
under study. The error scores obtained during these 
trials were used to select Ss for the experimental 
groups. The group assignments were as follows: 
In Experiment I, 16 Ss were divided into two groups 
of eight, matched in terms of means and variances 
of the performance scores of both systems during 
the pre-experimental trials. For Experiment II, 16 
new Ss were divided into two groups of eight, using 
performance data from the pre-experimental trials 
as criteria. The eight best Ss were assigned to 
operate the acceleration control system and the eight 
poorest Ss were assigned to operate the acceleration- 
aided system. In Experiment III, the remaining 16 


Ss were divided into two matched groups as in 
Experiment I. 

Once assigned to a specific group, Ss operated only 
one system throughout the remainder of the experi- 
ment. The Ss in all experiments received equivalent 
training, which consisted of 23 sessions of 10 60-sec. 
trials. Two sessions per day were given, one in the 
morning and one in the afternoon. The Ss were 
told their scores and encouraged to try to improve 
their performance with each trial. 

At the end of the 23rd training session in all three 
experiments Ss were required to operate the system 
under a series of conditions of task-induced stress. 
Before each stress condition Ss were given five 60- 
sec. trials without stress, followed by an explanation 
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of how the next session would differ from the train- 
ing sessions. With the exception of the condition of 
prolonged trials, Ss were given 10 60-sec. trials under 
each of the stress conditions. Each variety of stress 
was presented on a different day in the following 
order: 2 

1. Prolonged trials: The Ss were required to oper- 
ate the system continuously for 60 min. Error 
scores were recorded at the end of every 5-min. 
period during the hour. 

2. Incompatible display-control relationship: The 
tracking task was performed with a display-control 
relationship which was the reverse of that on which 
Ss were trained. 

3. Left-hand tracking: Under this condition, Ss 
were required to track with their left hands. (During 
the pre-stress training they tracked with their right 
hands.) 

4. Two-hand tracking: Two targets were tracked 
simultaneously, one with a right-hand control and 
one with a left-hand control. The left-hand system 
was identical with the right-hand system; the two 
dots followed the same course input, which was that 
employed during the training sessions. Only per- 
formance with the right-hand control is used in 
analyzing the results. 

5. Two-coordinate tracking: The dot and stick 
were free to move in two coordinates and Ss were 
required to track the target in both. The dot was 
controlled with the right stick only. The course 
input to the system was that used during training, 
except that its path of movement was rotated 45° 
from the horizontal axis. Only the performance of 
the system in the horizontal coordinate is used in 
analyzing the results. 

6. Secondary visual task: The tracking task during 
this condition was the same as that employed during 
the training sessions. In addition to this, Ss were 
required to perform a second task which consisted 
in detecting and reporting the range and bearing of 
targets on a simulated radar scope. A measure was 
obtained of the accuracy of Ss’ reports of targets 
as well as their tracking scores. 

7. Secondary arithmetic task: While performing 
the tracking task, Ss were required to solve two- 
digit subtractions at a very rapid rate. In order to 
obtain some measure of Ss’ arithmetic ability they 
were given the same arithmetic problems without 
tracking on the following day. 


Results 


Performance is expressed in terms of me- 
dian integrated error scores. This is necessi- 
tated by the fact that under several conditions 
of stress, some Ss lost the target before the 
end of the full’60-sec. trial. On such occa- 
sions, maximum scores of 100 were recorded. 
When a median score of 100 is presented, this 


2For further description of the stress conditions, 
see Garvey (1957, pp. 3-4). 


Operator Variables, System Dynamics, and Task-Induced Stress 


1 EXP I 
16 —— ACCELERATION 
~-- ACCELERATION- 

12 AIDED 

= 
EXP I 
a 16} —— ACCELERATION 
| --- ACCELERATION- 
iol AIDED 
az 12 
uJ 
a& 
w st 
at 4] 
co 
2 EXP 0 
— ACCELERATION 
--- POSITION 
o 

8+ 

4+ 


5 9 3 W 2 
TRAINING SESSIONS 


Fic. 2. System performance as a function of train- 
ing. The top graph shows the results from Experi- 
ment I; the middle graph, from Experiment II; and 
the bottom graph, from Experiment III. 


indicates that the targets were lost on 50% 
or more of the trials under that condition. 

Effects of training. The results of training 
in Experiment I are shown in the top graph 
of Fig. 2. The performance of the accelera- 
tion-aided control system is substantially bet- 
ter than that of the acceleration control 
system. Using the mean of all trials within 
a session for each S as a datum, the difference 
in system performance for each session was 
tested with Wilcoxon’s (1949) test for un- 
matched replicas and was found to be signifi- 
cant throughout the training sessions (p 
< 0.01). 

The middle graph in Fig. 2 shows the re- 
sults obtained in Experiment II. These dif- 
ferences, when tested with Wilcoxon’s test, 
were not found to differ throughout the entire 
training period (p > 0.25). 

The results of Experiment III are shown in 
the bottom graph in Fig. 2. During the first 
seven sessions, performance of the accelera- 
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Fic. 3. System performance as a function of prolonged periods of continuous 
tracking. The left-hand graph shows the results from Experiment I; the 
middle graph, from Experiment II; and the right-hand graph, from Experi- 


ment III. 


tion control system was significantly poorer 
than that of the position control system 
(p < 0.05) when tested with Wilcoxon’s test. 
With the exception of Sessions 13 and 15, the 
differences between the two systems were not 
reliable throughout the remainder of the train- 
ing sessions (p > 0.05). 

Effect of stress. The graphs in Fig. 3 show 
the results of the 60-min. continuous-tracking 


trials. The measure of performance is given 
in terms of median integrated error per min. 
for each 5-min. tracking period of the 60-min. 


‘trial. 


Using Wilcoxon’s (1949) test for compari- 
son of several treatments, it was found in 
Experiment I that performance deteriorated 
significantly for both systems with the in- 
creased duration of the trial (p < 0.05). The 


Table 1 


System Performance Deterioration Under Stress 
(Median Integrated Error) 


Experiment I 
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interaction between systems and duration of 
trial, when tested with Wilcoxon’s (1949) test 
of interactions, indicated that this interaction 
was not significant (p > 0.05). This is taken 
to indicate that the differential deterioration 
shown in Fig. 3 is not significant. 

In Experiment II, Wilcoxon’s (1949) test 
indicated that the performance of both sys- 
tems deteriorated significantly (p < 0.05) 
with continuous running. With the exception 
of the 30- and 35-min. periods, performance 
of the acceleration control system was re- 
liably poorer (p < 0.05) than that of the 
acceleration-aided system after the first 10 
min. of tracking. 

Performance of both systems deteriorated 
significantly in Experiment III with duration 
of trial (p < 0.05). Except for the 5-, 15-, 
55-, 60-min. periods, performance of the ac- 
celeration control system was significantly 
poorer than the position control system 
(p < 0.05). 
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Fic. 4. System performance unstressed and under 
several conditions of task-induced stress. The stress 
conditions are from left to right: Incompatible 
Display-Control Relationship, Left-hand Tracking, 
Two-hand Tracking, Two-coordinate Tracking, Sec- 
ondary Visual Task, and Secondary Arithmetic Task. 
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The results of the other stress conditions 
are summarized in Table 1 and are presented 
graphically in Fig. 4. The “unstressed” con- 
dition in Fig. 4 represents an average of the 
medians of system performance on the last 
five training sessions. The columns labeled 
Amount in Table 1 represent the median dif- 
ference between system performance under 
a stress condition and the unstressed condi- 
tion. The columns labeled Difference repre- 
sent the difference in the amount of deterio- 
ration between the performances of the two 
systems employed in each experiment. Wil- 
coxon’s (1949) matched replicas test was used 
to obtain the p values for the amount of de- 
terioration and his unmatched replicas test 
was used to obtain the p values for difference 
in deterioration. 

The results shown in Table 1 and Figs. 3, 
4 may be summarized as follows: 

Experiment I. The performance of the 
acceleration-aided control system was consid- 
erably better than that of the acceleration 
control system throughout the entire training 
period. When these systems were operated 
under conditions which stressed the human 
operator this difference between system per- 
formance was accentuated. 

Experiment II. Under a majority of the 
stress conditions, performance of the accelera- 
tion control system was significantly poorer 
than that of the acceleration-aided control 
system. 

Experiment III. Under conditions which 
stressed the operator, performance for both 
systems deteriorated, but the deterioration 
was greater for the acceleration control 
system. 

Brief mention should be made of perform- 
ance on the secondary tasks. In none of the 
experiments were reliable differences in target- 
detection performance found between groups 
when tested with Wilcoxon’s (1949) test for 
unpaired replicas (p > 0.05). Likewise, no 
reliable differences (p > 0.05) were found in 
the arithmetic performances of the groups 
in Experiments I and III (either when track- 
ing or when not). However, in Experiment 
II, Ss in the acceleration-control group did 
better (p < 0.05) than Ss in the acceleration- 
aided control group (both when tracking and 
when not). These results are taken as evi- 
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dence that the differential deterioration of 
system performance under these conditions 
were not due to differential effort on the 
secondary task at the expense of the primary 
task. 


Discussion 


Many previous studies have indicated that 
the nature of the dynamics is an important 
variable in determining tracking system per- 
formance under laboratory conditions. Aided 
tracking is generally found to be superior to 
unaided (Birmingham & Taylor, 1954; Cher- 
nikoff, Birmingham, & Taylor, 1955; Fox- 
boro Co., 1945) and position control has been 
shown to be better than pure velocity control 
with medium and high frequency inputs 
(Chernikoff & Taylor, 1957; Lincoln & Smith, 
1952). One of the purposes of the present 
study was to determine whether or not the 
order of merit of tracking systems, estab- 
lished in the absence of stress through 
manipulation of system dynamics, would be 
altered by task-induced stress. The findings 
are clear in indicating that such is not the 
case for the systems studied. The better 
systems retained their advantage under stress. 

This result held up even when the dynamic 
variable was counterbalanced by selection (Ex- 
periment II) and by training (Experiment 
III) in such a fashion as to equate entirely 
measured system performance before stress 
was applied. Thus, within the confines of 
the present studies the “engineering” variable 
of system dynamics proved to be ascendant 
over the “psychological” variables of selection 
and training in determining relative perform- 
ance under stress. 

It is recognized that the generalization of 
these findings to other dynamic variables, 
different types of systems, other varieties of 
stress and forms and degrees of selection and 
training not- employed here would be very 
hazardous. It is entirely conceivable that 
system dynamics could be adjusted in such 
a way as to cause the better of two man- 
machine systems (operated in the absence of 
stress) to become the less proficient under 
stress. Likewise, there are undoubtedly cir- 
cumstances in which the level of ability of 
the human operator or the extent of his train- 
ing will have more to do with man-machine 


system performance under stress than will 
machine variables. Certainly, when system 
dynamics are held constant, the proficiency 
level of the human operator becomes an ex- 
tremely important predictor of stressed per- 
formance. 

What is clearly needed is a far better 
understanding of (a) the information proc- 
essing requirements of the operator in differ- 
ent systems, (6) how men differ in their 
ability to meet these requirements, (c) how 
training enhances human information process- 
ing, (d) how different types of stress act to 
degrade this aspect of human performance, 
and finally, (e) how different system configu- 
rations reflect changes in the operator’s in- 
formation handling capacity. Until these 
fundamental questions are answered, the 
ability to predict the way in which different 
variables will affect the performance of man- 
machine systems will remain most limited and 
very much a matter of ad hoc research. 

Nevertheless, although the present findings 
are of restricted generality, they do have 
practical significance. First, they show that 
the performance of some systems is disrupted 
more by task-induced stress than others. 
Thus, there is no single “degradation factor” 
by which unstressed performance can be mul- 
tiplied to predict performance “in the field.” 
Second, as a corollary of the above, man- 
machine systems which may be found to 
differ only slightly in a laboratory evaluation 
may differ very considerably under the exi- 
gencies of stressful field operations. Third, 
the variable of system dynamics may be of 
paramount importance in determining the 
“stress-resistance” of a man-machine system. 
Finally, the use of operator selection and 
training as a substitute for proper system de- 
sign, although it may work in the laboratory, 
is a highly questionable procedure if the hu- 
man components of the system will have to 
work under stress. 


Summary 


' Three experiments were conducted to deter- 
mine the effect of stressing the human ele- 
ment in a man-machine system on the per- 
formance of the systems. 

In Experiment I, matched Ss were trained 
to operate an acceleration control system and 
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an acceleration-aided control system. Per- 
formance of the acceleration-aided control 
system was found to be superior to that of the 
acceleration control system during training. 
When stressed, performance cf both systems 
deteriorated and the difference between the 
performances of the two systems was accen- 
tuated. 

In Experiment II, the difference in the un- 
stressed performance of the two systems used 
in Experiment I was eliminated by selecting 
poor trackers to operate the acceleration-aided 
system and good trackers to operate the ac- 
celeration control system. Even though per- 
formance of the two systems was equated by 
this selection procedure, the acceleration con- 
trol system deteriorated to a greater extent 
under the majority of the stress conditions. 

In Experiment III, matched Ss were trained 
(with equal amounts of practice) to operate 
a position control system and an acceleration 
control system. At the beginning of train- 
ing, performance on the two systems was 
found to differ; however, this difference was 
gradually eliminated through training. When 
these trained Ss were stressed, performance of 
both systems deteriorated and that of the 
acceleration control system deteriorated to a 
greater extent. 


The results of these experiments are dis- 
cussed relative to training, selection, and the 
design of man-machine system. 


Received July 22, 1958. 
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THE FRAMES OF REFERENCE OF FLYING 
INSTRUCTORS 


RICHARD WANT 
Department of Air, Australia 


Krumboltz and Christal (1957) have shown 
that in assessing the capacities of their stu- 
dents flying instructors employ a relative 
rather than an absolute frame of reference. 
They point out that, as each instructor usu- 
ally has only four students, variations in 
standards that occur as a result of shifting 
frames of reference can have serious con- 
sequences: 


One student grouped with highly talented fellow 
students might fail while another student of equal 
or even less ability migat pass because he had hap- 
pened to be placed with students of low ability. 
If such a condition prevails, the Air Force is not 
getting the best possible pilots, deserving men are 
failing, and the true validity of the pilot stanine is 
not being estimated accurately (Krumoltz & Christal, 
1957, p. 409). 


A copy of the paragraph containing the 
above quotation was circulated to a number 


of flying instructors in the Royal Australian 


Air Force. The almost unanimous response 
was, “That sort of thing would not happen 
here.” A senior officer summed up the gen- 
eral attitude by saying, “Flying instructors 
don’t scrub their pupils because they are not 
as good as their mates; they scrub them be- 
cause they aren’t able to fly.” 

The aim of this paper is to present some 
figures to indicate that the frames of refer- 
ence of flying instructors in this country are 
influenced by variations in the quality of 
members entering training, despite their opin- 
ion to the contrary. These figures were ob- 
tained fortuitously, as a result of the intro- 
duction of a new pilot selection system in 
the Air Force. 


Method 


In Australia, Naval pilot trainees are trained in 
Air Force units. They wear Naval uniforms but 
live and do their training side by side with Air 
Force trainees. An effort is made to treat the 
members of the two services on a basis of scrupu- 
lous equality. If one ignores the fact that they are, 
as a rule, in a minority, there does not appear to 
be any significant feature in which the situation of 


the Naval trainees ditfers from that of those in the 
Air Force. 

The trainees of the » .o services are selected 
separately. Until May, 1955, the methods of selec- 
tion used by the two services, though not identical, 
were similar. Candidates completed a battery of 
intelligence tests and were interviewed by a selection 
board. All Naval trainees were commissioned on 
the completion of their training. At the time these 
figures were recorded, the Air Force trainees gradu- 
ated as sergeants, and were considered for commis- 
sions at a later date. The Naval system of selection 
therefore placed slightly more emphasis on the can- 
didate’s general level of education and his “officer 
qualities.” 

In May, 1955, the Air Force introduced a new 
selection system incorporating a “pilot stanine,” 
which was based on tests which had been validated 
in the United States (Flanagan, 1948) and in Eng- 
land. The Naval selection system remained un- 
altered. 

Success and failure rates are reported for Air 
Force and Naval students entering training during 
two periods. The first, from January, 1953, to 
April, 1955 (before the introduction of the new 
system in the Air Force) is referred to as “before”; 
the second, from May, 1955, to April, 1957 (after 
the introduction of the new system in the Air 
Force) is referred to as “after.” 

The students considered as successful completed 
all aspects of flying training, including ground train- 
ing, and graduated as pilots. Those considered as 
failures were removed from training on account of 
inability to learn to fly. Those removed from train- 
ing on medical grounds, for academic failure, or at 
their own request, were excluded from the study. 

Pilot stanine scores for the students entering the 
Air Force in the period “after” were correlated with 
the pass-fail criterion. Psychological test results 
of 81 students who were tested under the old sys- 
tem, and who entered the service during 1953, were 
also correlated with this criterion. 


Results 


The figures for the number of students 
passing and failing in the period “before” 
are given in Table 1, and those for the period 
“after” in Table 2. 

The stanine scores of the 91 Air Force 
members in the sample for the period “after” 
ranged from 3 to 9. A few members with 
scores of 1 and 2 were eliminated before the 
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Table 1 


Pass-Fail Rate in Period “Before” 


Per- 
centage 
Failing 


No. No. 


Service Passing Failing Total 


Table 2 
Pass-Fail Rate in Period ‘‘After” 


Per- 
centage 
Failing 


No. No. 


Service Passing Failing Total 


Air Force 108 55 163 
Navy 45 29 74 


33.7% 
59.2% 


actual commencement of flying training. The 
biserial correlation between those scores and 
the pass-fail criterion was .40 (this result 
is significant at the .001 level). Corrected for 
attenuation this correlation rises to .52. 

The test results of the 81 members tested 
under the old system yielded a correlation 
of —.09, which was not statistically sig- 
nificant. 

A study of Tables 1 and 2 reveals that, 
though there was a decrease in the failure 
rate in the Air Force after the introduction 
of the new system, this decrease was small 
and not statistically significant. The failure 
rate in the Air Force was lower than that 
in the Navy during both periods. During 
the period “before” the difference was not 
statistically significant. During the period 
“after” it was significant at the .001 level 
(it may be noted that the Naval failure rate 
was more than double that of the Air Force). 

In order to determine whether the change 
in wastage patterns thus noted had been 
gradual, or whether it had occurred steeply, 
the results for the period “before” were sub- 
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divided into three phases, and those for the 
period “after” into two phases, in accordance 
with the dates at which members entered 


training. The resulting figures appear in 
Table 3. 


Discussion 


Table 3 indicates that the failure rate in 
the Navy did, in effect, rise steeply, and pre- 
cisely at the same time as the Air Force 
members selected by the new method entered 
training. 

With increasing industrialization in this 
country, there has been a gradual decline in 
the numbers of young men offering for tech- 
nical training in the services. There is no 
reason to believe, however, that any sudden 
change in this regard occurred during 1955, 
nor is there any reason to believe that any- 
thing occurred at this time that would have 
affected the quality of the candidates offer- 
ing for the Navy without at the same time 
affecting the quality of those offering for the 
Air Force. It would seem reasonable to infer, 
therefore, that the sudden change in the 


Table 3 


Pass-Fail Rate in Periods “Before” and ‘“‘After” Broken Up into Phases 


Air Force 


Total 


Date of Entry Number 


Period ‘‘Before” 
Jan. ’53 to Aug. 53 53 
Sept. ’53 to Aug. ’54 46 
Sept. ’54 to April ’55 64 
Period “After” 
May ’55 to May ’56 42 
June ’56 to April ’57 49 


Navy 


Total 
Number 


Percentage 
Failing 


Percentage 
Failing 


30.1% 
37.0% 
34.4% 


31.8% 
46.7% 
36.4% 


26.7% 
36.7% 


63.6% 
68.0% 
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wastage patterns in the two services was re- 
lated to the introduction of the new Air 
Force selection system. The existence of a 
significant correlation between the new sys- 
tem and the pass-—fail criterion, and the lack 
of correlation in the case of the old system, 
would suggest that the new system did, in 
effect, make a contribution to the efficiency 
of selection. An increase in the failure rate 
in the students not selected by the new 
method is, under the circumstances, just what 
one would have predicted on the basis of the 
principle stated by Krumboltz and Christal. 

It is to be noted that there were approxi- 
mately twice as many Air Force members as 
Naval members under training at any one 
time. The frames of reference of the instruc- 
tors would, therefore, be influenced to a ma- 
terial extent by any change in the general 
standard of the former. It is also to be noted 
that an improved selection system would tend 
to decrease appreciably the number of train- 
ees in the lower range, thus rendering con- 
spicuous to their instructors those who hap- 
pened to be in this range and exposing them 
to greatly increased risk of failure. 

If the new system has made a contribution 
to the efficiency of selection, as claimed, why 
has there not been a significant reduction in 
the failure rate? 

Since 1955, the Air Force has been expe- 
riencing the results of technical changes. Air- 
craft have become more costly. Reciprocating 
engines have been yielding to jets, and the 


view has been put forward that all students 
who graduate should be capable of converting 
to jets. It would not be surprising if pres- 
sures arising out of these factors had caused 
flying instructors to raise their standards. 
It is, nevertheless, a point of considerable 
interest that, if they have raised their stand- 
ards, they do not seem to be aware that they 
have done so. 


Summary 


This paper examines the failure rates in 
Air Force and Naval trainees trained side by 
side. The method of selection of Air Force 
trainees was altered at a given point of time, 
but the method of selection of the Naval 
trainees remained unaltered. Although no 
significant change was noted in the failure 
rate in the Air Force trainees, the failure rate 
in Naval trainees rose steeply. It was argued 
that this change in the failure rate of the 
Naval trainees could be explained in terms 
of a change in the frames of reference of 
flying instructors. 
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PERSONALITY CORRELATES OF SOCIOMETRIC 
STATUS 


CARROLL E. IZARD 
Vanderbilt University 


Current sociometric ranking and rating tech- 
niques were derived from sociometry, a method 
advanced by Moreno (1934) for analyzing 
the feeling or preference relationships among 
the members of a human group. The original 
sociometric device as modified by various in- 
vestigators has been used in measuring the 
effects of psychotherapy (Kelman & Parloff, 
1957), social adjustment (Izard, Rosenberg, 
Bair, & Maag, 1953), and leadership poten- 
tial (Hollander, 1953; Izard & Rosenberg, 
1958; McClure, Tupes, & Dailey, 1951; 
Stogdill, Scott, Elton, Jaynes, Miller, Fleish- 
man, Wherry, & Bakan, 1953). Sociometric 
measures have been found reliable (Ander- 
halter, Wilkins, & Rigby, 1952; McClure et 
al., 1951; Wherry & Fryer, 1949) and signifi- 
cantly related to such criteria as academic 
grades (Williams & Leavitt, 1947), ratings 
of superiors (Hollander, 1953), graduation- 
elimination (Hollander, 1953; McClure et al., 
1952), and on-the-job ratings (McClure et al., 
1952; Wherry & Fryer, 1949). The present 
paper reports three studies of the personality 
correlates of sociometric status. 

A measure of sociometric status in terms of 
leadership was chosen for two reasons: it has 
been shown to be highly reliable (Webb, 
1954); and leadership seemed closely related 
conceptually to the usual notion of status, 
especially since the groups being studied were 
military. 


1 Begun while with Tulane University—ONR Proj- 
ect NR 154-098, at the U. S. Naval School of Avia- 
tion Medicine, Pensacola, Florida. Opinions or con- 
clusions contained in this report are those of the 
author. They are not to be construed as necessarily 
reflecting the views or possessing the endorsement of 
the Navy Department. 

2 John H. Manhold, now with Washington Univer- 
sity School of Dentistry, collaborated with the au- 
thor on the first of the three studies presented in this 
paper. A full report of their joint effort was printed 
as U. S. Naval School of Aviation Medicine Rep. No. 
NM 001 077.01.05 and read at the Southern Society 
for Philosophy and Psychology, Atlanta, Georgia, 
1954. 


Over-All Adjustment—General Medical and 
Psychogenic Factors 


In the first study it was hypothesized that: 
(a) a group of cadets with a high number of 
dispensary visits and hospitalizations would 
have a lower mean sociometric leadership 
score than the cadet population; and (6) that 
within this group, cadets judged to be in 
a psychogenic or psychosomatic classification 
would have a lower mean sociometric leader- 
ship score than the remainder of the group. 


Procedure 


Subjects. The sample selected for this study con- 
sisted of the 26 classes (N = 1080) who entered the 
Naval Air Training Program during the first half of 
1953. The Ss ranged from 18 to 27 years of age. 
They had at least two years of college education or 
its equivalent. They were selected for the program 
on the basis of an individual interview conducted by 
a flight surgeon, a battery of psychometric measures, 
and the usual physical examination for naval avia- 
tion candidates. 

The health data. The investigator obtained a 
complete record of all dispensary and hospital visits 
made by the 1080 cadets during the pre-flight course 
and the first three stages of flight training. These 
records made it possible to collate for each individual 
an eight month cumulative medical history which 
showed the date reported to dispensary, complaint, 
diagnosis, treatment, and disposition of case 

The sociometric measure of leadership. The socio- 
metric measure was a peer nomination form which 
carried a definition of leadership and instructions to 
nominate in order the three best and three least 
qualified Ss for leadership positions in the program 
in which the group was participating. The socio- 
metric measure was administered to cadet groups, 
each consisting of about 20 men who had been living 
and working together for 13 weeks. The resulting 
ordinal data were normalized by means of Fisher’s 
rankit transformation (Fisher & Yates, 1953). 


Results 


Of the 1080 cadets in the total sample, 167 
had made five or more visits to the dispensary 
or hospital during the eight-month period. 
Sociometric data were available on 127 of 
these Ss. Their mean sociometric leadership 
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score in terms of rankits was —.259; the 
standard deviation was .985. The ¢ test for 
the difference between the observed mean and 
the population mean yielded a value of 2.98, 
P< .0l. 

Two judges working independently and with- 
out knowledge of sociometric scores classified 
the 167 Ss who made five or more dispensary 
visits into a “psychosomatic” and a “nonpsy- 
chosomatic” group. The judges agreed on 
139 of the 167 Ss classified. Interjudge reli- 
ability as measured by Kendall’s (1948) tau, 
was .66; the Pearson product-moment coeffi- 
cient estimated from this value was .86. Sub- 
sequent statistical analyses were concerned 
only with those Ss on whom both judges 
agreed and for whom sociometric data were 
available. The final psychosomatic group had 
56 Ss and the nonpsychosomatic had 47. 

The psychosomatic group had a mean socio- 
metric score of —.551 and a standard devia- 
tion of .940. For the nonpsychosomatic group 
the mean was .199, the standard deviation 
914. The Finax test (Bliss & Calhoun, 1953) 
showed that the variances were homogeneous. 
The analysis of variance of the sociometric 
scores presented in Table 1 indicated that the 
two groups had significantly different means. 

The mean sociometric values of the psycho- 
somatic and nonpsychosomatic groups were 
also compared with the population mean of 
zero. For the psychosomatic group the ¢ was 
4.37, P< .001; for the nonpsychosomatic 
group the ¢ was 1.50, P > .10. 

With respect to number of medical com- 
plaints, the mean and standard deviation for 
the psychosomatic group was 10.84 and 7.09 
respectively. Comparable statistics for the 
nonpsychosomatic group were 6.53 and 1.67. 
The fact that the difference in these means 
was significant at the .01 level suggested that 


Table 1 


Analysis of Variance of the Sociometric Scores of the 
Psychosomatic and Nonpsychosomatic Groups 


Source df 


Variance F 
Between Groups 1 14. 3441 16.65 001 
Within Groups 101 8616 


Total 102 
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the psychosomatic group might be lower on 
leadership chiefly because it had a greater 
mean frequency of medical complaints. To 
test this latter possibility, the psychosomatic 
group was subdivided into low and high halves 
with respect to frequency of medical com- 
plaints. The mean number of medical com- 
plaints for the low half of the psychosomatic 
group was 6.56. This was practically identi- 
cal with the mean of 6.53 for the nonpsycho- 
somatic group. However, for the low-fre 
quency psychosomatic group the mean socio- 
metric score of —.55 was significantly lower 
(P < .01) than that for the nonpsychosomatic 
group and identical with that for the high- 
frequency and over-all psychosomatic groups. 


Performance in Group Activities 


In the second study it was hypothesized 
that sociometric status was related to per- 
formance or proficiency in the activities in 
which the group is engaging. The perform- 
ance index was suggested by the findings of 
Rosenberg (1954). He showed there was a 
significant relationship between time taken to 
complete the Naval Air Training Program 
and all important measures of cadet perform- 
ance during training. 


Procedure 


The Ss and the performance index. The groups 
designated in Rosenberg’s study of fiscal 1951 gradu- 
ates as fast (15 months to graduate), middles (17.4 
months to graduate), and slow (20 months to gradu- 
ate) were utilized here as the high, middle, and low 
groups on training program performance. The Ns 
for these groups were 47, 100, and SO respectively. 
Rosenberg (1954) showed that the performance in- 
dex, time taken to complete training, did not dif- 
ferentiate the high, middle and low groups on the 
selection test measures of scholastic and mechanical 
aptitude, but it effectively ranked these groups on 
the following measures of cadet performance: pre- 
flight grades, flight grades, number of flights to 
complete training, number of unsatisfactory flights, 
number of accidents, and number of board (disci- 
plinary) actions. 

The sociometric data. Sociometric data as previ- 
ously described were available on all 47 cadets in 
the high performance group, on 48 in the low group, 
and on 99 in the middle group. 


Results 


The mean sociometric scores for the high, 
middle, and low performance groups were .231, 
.105, and —.230, respectively; the standard 
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Table 2 


Analysis of Variance of the Sociometric Scores of the 
High, Middle, and Low Performance Groups 


Source df Variance F 
Between Groups 2 2.793 3.90 025 
Within Groups 191 717 

Total 193 .738 


deviations were .881, .841, and .823. Table 
2 presents the analysis of variance of the 
sociometric scores for the three groups. 

The analysis of variance indicated that the 
mezn sociometric scores of the three groups 
were significantly different. Comparison of 
the groups by the ¢ test showed that both the 
high and middle performance groups had sig- 
nificantly higher sociometric leadership scores 
than did the low group. The mean for the 
high group was greater than that of the mid- 
dle group, although this difference was not 
statis'ically significant. The trend over all 
groups was in the expected direction—the 
higher the performance index, the higher the 
mean .ciometric leadership score. These 
results support the hypothesis that over-all 
performance or proficiency in the activities 
in which the group is engaging is a significant 
correlate of sociometric status. 

The evidence for equal aptitudes among 
groups suggests that the performance index 
may reflect an affective or motivational factor. 
Interpreting the present results as evidence 
for a relationship between sociometric status 
on leadership and an affective factor such as 
need for achievement is in keeping with the 
findings of Henry (1949), Hanawalt, Hamil- 
ton and Morris (1943), and Warner and 
Abegglen (1955). 


Apiitudes, Superiors’ Ratings, and a Forced- 
Choice Self-Description Inventory 
In this study it was hypothesized that rele- 
vai!’ aptitudes, superiors’ ratings, and a forced- 
choice leadership inventory would correlate 
significantly with sociometric status. 


Procedure 


Subjects. The Ss selected for this study were the 
330 cadets in the fortieth through the forty-sixth 
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classes that entered the Naval Air Training Program 
in 1953. 

Tests and ratings. The aptitude tests were the 
Aviation Classification Test (ACT), a measure of 
scholastic aptitude or general intelligence; the Me- 
chanical Comprehension Test (MCT), a measure of 
mechanical aptitude; and the Physical Fitness Tests 
PFT), a measure of physical aptitude for activities 
involving the large muscles. These three tests were 
administered during the first week of training. 

The superiors’ ratings utilized in this study were 
ratings of Officer-Like-Qualities (OLQ) made after 
13 weeks of training and entered in official Navy 
records. They represent an over-all rating on per- 
sonal characteristics relevant to success as a naval 
officer. 

The forced-choice personality measure utilized in 
this study was the ROTC Self Description Inventory 
developed by Brogden and his associates (Brogden, 
Machlin, Loeffler, Newkirk, & Yaukey, 1952). For 
purposes of this study navy terminology was substi- 
tuted for army terminology and the inventory was 
designated as the Navy Self Description Inventory 
(NSDI). The original inventory was empirically 
derived and validated against ROTC and West Point 
aptitude-for-service ratings. To date there has been 
no attempt to determine the specific factors meas- 
ured by the inventory and little can be said along 
this line except that the factor or factors measured 
are nonintellective in nature. The NSDI was admin- 
istered to the cadets during the thirteenth week of 
training. The sociometric measure of leadership was 
the same as that utilized in the two preceding studies. 


Results 


The relationship of each of the five hy- 
pothesized correlates to the sociometric meas- 
ure of leadership and to each other is shown 
in Table 3. All of the intercorrelations among 
the five correlates are low enough that they 
can be considered relatively independent. 
The correlation of each of the five with the 


Table 3 


Intercorrelations of the Self-Description Inventory 
(NSDI), Physical Aptitude (PA), Scholastic Aptitude 
(ACT), Mechanical Aptitude (MCT), Superiors’ Rat- 
ings (OLQ), and the Sociometric Measure of Leader- 
ship (SML) 


(N = 330) 
NSDI PA ACT MCT OLQ SML 
NSDI 12-14 -10 8.28.27 
PA 04 22 
ACT 38 «2119 
MCT W912 


OLQ 66 
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sociometric measure of leadership is signifi- 
cant at the .01 level except MCT where 
P< 05. 

Four of the five correlates in Table 3 are 
tests which can be administered to candidates 
prior to entering the pilot training program. 
A multiple correlation coefficient was com- 
puted to determine the effectiveness of these 
measures for predicting sociometric status and 
thus for selecting Ss with leadership potential. 
The multiple correlation was .40; after cor- 
recting for shrinkage, cR was .39. 

A second multiple correlation was computed 
to determine the amount of variance in the 
sociometric measure that could be accounted 
for utilizing all five correlates. This R was 
only .67 (cR = .66), not significantly differ- 
ent from the product moment correlation of 
.66 between sociometric status and superiors’ 
ratings. 

Further examination of the correlation ma- 
trix in Table 3 affords some noteworthy obser- 
vations about the personality measures under 
consideration. By far the highest product 
moment correlation was between officers’ rat- 
ings of the cadets (OLQ) and sociometric 
status or the cadets’ ratings of each other 
(SML). This product moment correlation 
(.66) was considerably higher than the mul- 
tiple correlation (.40) of all the other vari- 
ables (aptitude tests and _ self-description 
inventory) with sociometric standing. These 
test and inventory indices correlated almost 
identically with superiors’ ratings and socio- 
metric ratings. The first order correlation. of 
.66 was essentially equal to the multiple cor- 
relation of .67 between all the correlates and 
sociometric status. It follows that all the 
test and inventory measures together only 
account for part of variance common to supe- 
riors’ ratings and the sociometric measure. 
This is a situation where “‘subjective” (socio- 
metric) ratings based on direct observations 
of behavior are quite superior to “objective” 
(test, inventory) measures in predicting a 
criterion. 

Summary 


The three studies presented in this paper 
were designed to ascertain some personality 
correlates of sociometric status. In the first 
two studies, sociometric status was validated 
against rather holistic behavioral indices— 
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one based on psychogenic factors in health, 
the other on performance in the activities in 
which the group was engaging. Groups of Ss 
categorized in terms of these indices differed 
significantly on the sociometric measure. 
These findings were interpreted as evidence 
supporting the frequently made but infre- 
quently tested assumption that sociometric 
measures reflect meaningful personality vari- 
ables which can be reliably measured in terms 
of observable behavior. 

The third study examined the relationship 
of three psychometric indices of personality, 
physical aptitude, and superiors’ ratings to 
sociometric status. All five of these measures 
correlated positively and significantly with 
sociometric status. The four tests which are 
presently usable in selection yielded a mul- 
tiple correlation of .40 with sociometric status 
measured in terms of leadership. This points 
toward the feasibility of developing a test 
battery for the selection of individuals with 
leadership potential. 

In studying the relative effectiveness of the 
various correlates in accounting for the vari- 
ance in sociometric status, superiors’ ratings 
were better than the four test and inventory 
indices combined. However, superiors’ rat- 
ings and sociometric status have to be con- 
sidered as essentially concomitant criteria of 
personality variables (in the present case as 
intermediate criteria of leadership), while the 
test and inventory measures may be consid- 
ered as predictors of these criteria. 
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The weighted application blank has come 
into wide use as a selection tool in recent 
years. Generally, the development of such a 
tool proceeds by establishing criterion groups 
(good vs, poor employees, terminated vs. 
present employees, etc.) and comparing these 
groups on a number of biographical data or 
personal history items (age, marital status, 
education, etc.). Weights are assigned to 
each item in accordance with its ability to 
discriminate between the criterion groups. 
Weights so developed are applied to new ap- 
plicants to predict later success. 

In weighted application blanks, as in men- 
tal testing, the problem of preselection arises. 
For validation purposes, investigators are 
limited to persons actually hired. However, 
a careful search of the literature fails to re- 
veal a single weighted application blank study 


THE PROBLEM OF PRESELECTION IN WEIGHTED 
APPLICATION BLANK STUDIES 


JAMES H. MYERS anp WADE ERRETT 
Prudential Insurance Company, Los Angeles 


Table 1 


Discrimination Levels of Bio-Data Items 


where any attention has been given to the 
amount of preselection which has occurred, 
not to mention the effect this should have 
upon the design of the final instrument for 
screening applicants. 


Preselection 


Table 1 illustrates the amount and degree 
of preselection which can occur in a practical 
situation. These data are based upon 291 
applications for clerical jobs in Prudential’s 
Western Home Office in Los Angeles. It is 
interesting to note that of the 19 biographi- 
cal-type items, 10 were already being used as 
a basis for selection, with confidence limits at 
or beyond the .001 level! 

Table 1 also shows that, among persons 
actually hired, only five of the 19 items (Nos. 
2, 5, 6, 7, and 16) were found to discriminate 


Item 


Hired vs. Terminated vs. 
Nonhired Nonterminated 
p 


1. Occupation 30 .20 
2. Courses liked best in High School 70 02 
3. Attendance at Business or Technical School <.001 30 
4. Would like to go to College now <.001 .20 
5. Plan to go to College in 5 years 10 001 
6. Number of acquaintances working at Prudential <.001 <.001 
7. Number of close friends working at Prudential 01 02 
8. Amount of money needed for living expenses <.001 30 
9. Number of friends attending college 50 50 
10. Years at present address .001 10 
11. Years in Los Angeles <.001 30 
12. Years in California <.001 .50 
13. If hired, when available for work 50 50 
14. Expected starting salary <.001 80 
15. Expected salary after one year <.001 .70 
16. Difference between expected start and year-end salary .20 02 
17. Reason for selecting Prudential <.001 .20 
18. Weight 95 .20 


. Height 
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between terminated and nonterminated em- 
ployees at or beyond the .05 level of confi- 
dence. The usual procedure would call for 
applying the weights from these five items to 
all incoming applicants as an integral part of 
the selection process. However, such a pro- 
cedure assumes no predictive value for the 10 
items on which significant preselection has 
occurred. To the extent that these items do 
have validity, the final selection instrument 
described above will be less than maximally 
effective. As a matter of fact, if the five items 
were to replace the existing selection pro- 
cedures, they could actually worsen the se- 
lection process by not allowing the preselec- 
tion items to operate. 

An example will help clarify this. It can 
be seen from Table 1 that neither “years in 
Los Angeles” nor “years at present address” 
distinguished turnover proneness in the group 
actually hired. One would “logically” expect 
less turnover among settled members of the 
community. However, less settled applicants 
were very probably not hired, judging from 
the significant amount of preselection indi- 
cated for these items. Yet, the five-item scor- 
ing key would make no provision for screen- 
ing these applicants out. 

At least one of the 10 preselection items 
(number of acquaintances working at Pru- 
dential) was found to be predictive of turn- 
over at the .001 level, even among the greatly 
restricted range found in the group actually 
hired. The failure of the other nine items to 
predict among the employed group may well 
be due in some measure to the significant pre- 
selection. This could also account for the 
fact that “logical’’ items often fail to predict 
in other weighted application blank studies. 


Discussion 


It is, of course, impossible to determine the 
predictive value of preselection items without 
hiring all applicants for a period of time and 
following their later progress. Clearly, how- 
ever, something should be done to make the 
final weighting system as effective as possible. 
Three possibilities suggest themselves: 

1. Employ the weights developed in the 
usual manner only to those applicants who 
have survived all steps of the normal screen- 


ing processes; i.e., allow preselection to oper- 
ate prior to utilizing the weights. This would 
be the simplest procedure. 

2. Apply “restriction in range” corrections 
to individual item validity co-efficients, where 
assumptions can be met. This, of course, can 
be done only where the items are continuous 
(age, height, weight) and not discrete (mari- 
tal status, occupation). 

3. Develop “preselection weights,” based 
upon differences between those hired and 
those rejected. These weights could then be 
used in one of two ways: 


a. In a two-stage screening process. Pre- 
selection weights would be applied first 
to incoming applicants, to predict whether 
or not they would have passed the nor- 
mal screening process. Then weights 
based on those actually hired could be 
applied only to those passing the initial 
screening, to further refine the predic- 
tion of later progress. 

. Both sets of weights (preselection and 
usual weighted application blank weights) 
for any given item could be combined 
to produce a single weight. This would 
be generally satisfactory where both 
weights were of the same sign (+ or —), 
but would present problems in the case 
of opposite signs. 


The type and degree of preselection being 
used is known only at the time of a study. It 
is primarily a function of the emphases of the 
employment interviewers (within the frame- 
work, of course, of fair employment prac- 
tices). Since these emphases can change at 
any time and alter, or even reverse, the pre- 
selection items in use, No. 3 would appear to 
be the best way to handle the prescreening 
items. 

It should be noted that the approaches to 
handling this problem suggested above assume 
validity for the items on which significant 
preselection has occurred. With lack of evi- 
dence to the contrary, and with some to the 
affirmative (e.g., “number of acquaintances” 
item), it seems to us that this course of ac- 
tion is safer than assuming no validity for 
preselection items. 


Received May 13, 1958. 
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EFFECTS OF NOISE ON HUMAN PERFORMANCE ' 


HARRY J. JERISON 
Antioch College 


Until about 1948, the only proper answer 
to a question on possible effects of noise on 
nonauditory performance would have been 
that mone had been demonstrated. Kryter 
(1950), who reviewed the experimental evi- 
dence available then, concluded that nearly 
all, if not all, studies showing deleterious ef- 
fects of noise could be criticized severely on 
the basis of faulty procedures. Since that 
time, Broadbent (1953, 1954) has demon- 
strated changes in working efficiency on tasks 
involving vigilance (alertness) and on a self- 
paced or externally paced seria] reaction task 
provided the tasks were performed without 
interruption for relatively long time periods. 
The experiments to be described confirm 
Broadbent’s results on vigilance and indicate 
additional measurable performance changes 
in relatively high energy noise fields. 


General Procedure 


In the three experiments to be reported here 
the general procedure was to run Ss individu- 
ally through three work sessions with one- 
week intervals between sessions. Subjects 
were paid volunteer male undergraduates. 
After all of the Ss for a particular experi- 
ment were chosen they were assigned ran- 
domly to two subgroups. The subgroups were 
constituted to counterbalance order effects, 
and the order of undergoing various proced- 
ures is indicated in Table 1. The training ses- 
sion, Session I, was one hour long for Experi- 


1 This article is based on a paper presented before 
the Aero Medical Association in April 1956. It re- 
ports the results of experiments performed in 1954 
and 1955 while the author was at the Psychology 
Branch, Aero Medical Laboratory. The preparation 
of this report was supported by the United States 
Air Force under Contract No. AF 33(616)-6095, 
monitored by the Aero Medical Laboratory, Direc- 
torate of Laboratories, Wright-Patterson Air Force 
Base, Ohio. 

The advice and criticisms of Virginia L. Senders 
and W. Dean Chiles on various phases of the experi- 
ments reported here are gratefully acknowledged. 
The author is also indebted to Arden K. Smith, Ben- 
jamin Chi, and Shelley Wing who served as research 
assistants. 


ment I on vigilance and two hours long for 
Experiments II and I11. 


The designation “quiet” in Table 1 refers 
to a noise that was used to mask the sounds 
of equipment. In Experiment I this was 
about 83 db re .0002 dyne/cm’, and in Ex- 
periments II and III it was about 77.5 db. 
The designation “noise” refers to the high 
level noise which was our major concern. In 
Experiment I it was about 114 db, and in Ex- 
periments II and III it was about 111.5 db. 
A spectral analysis of the noise is presented 
in Fig. 1. The noise was generated elec- 
tronically and broadcast by a loudspeaker 
mounted in the S’s room. 


Method and Results 
Experiment 1: Noise and Vigilance 


The purpose of this experiment was to check 
Broadbent’s previously reported results that per- 
formance on a prolonged vigilance task was poorer 
in noise than in quiet. The S’s task was to monitor 
a panel of three Mackworth-type clocks (cf. Mack- 
worth, 1950) and to press a response switch under 
a clock when its hand stepped through twice its 
usual excursion. The apparatus is illustrated in 
Fig. 2. Double steps occurred haphazardly at inter 
vals that averaged about once a minute for each 
clock. 


The results of this experiment are summa- 
rized in Fig. 3 which gives the average per- 
centage correct for the nine Ss of this experi- 
ment during their experimental and control 


sessions. It should be noted that average per- 
formance during these two sessions when noise 
levels were the same, that is, during the first 
half hour, was about 10 per cent better dur- 
ing the control session. The difference be- 
tween the sessions during the second and third 
half hours when the 114 db noise was present 
for the experimental session should, there- 
fore, not be attributed to an effect of noise. 
The parallel orientation of the two curves 
during the first one and one-half hours indi- 
cates that noise had essentially no effect on 
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Table 1 
General Experimental Design 


Session I Session II Session III 


Subgroup QN Training Control Experimental 
(Quiet throughout) (Two hours quiet) (1% hour quiet followed 
by 11% hours noise) 


Subgroup NQ Training Experimental Control 
(Quiet throughout) (4% hour quiet followed (Two hours quiet) 


by 1% hours noise) 


Note.—Sessions were held at one-week intervals. 


performance at that time. During the fourth Before going on to the next experiments it 
half hour the two curves diverge considerably is of some interest to note that vigilance as 
suggesting that noise may depress perform- measured here did not become less adequate 
ance only after a fairly considerable period as a result of fatigue alone. This result, the 
of time. absence of a performance decrement during 
An analysis of variance of the data of this the two-hour control session in quiet, is con- 
experiment is presented in Table 2. The dif- trary to that reported by Mackworth (1950) 
ference between average performance during 
the experimental and control sessions was not 
statistically significant (.20 > P> .10). The ~ 
difference between rate of change of perform- / \ he i x 
ance for the two sessions (the sessions by time } ) I a“ 
at work interaction) was significant at the .05 \ 
level. This supports the impression one gets 
from viewing Fig. 3 that the differentiation 
of performance in the fourth half hour is a 
“true” effect. A more detailed report of this 
experiment has been prepared for limited cir- 8 ! t 
culation (Jerison & Wing, 1957). 


Fic. 2. The display and response panels of Ex- 


periment I. Dial pointers normally stepped through 
- 3.5 degree arcs. 
“QuIET” 
"Qui 
o———_ "noise" 
90}- 
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EXPERIMENTAL 
\ | 
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Fic. 1. Octave band analyses of noise used in 
_ these experiments. Upper curves are of “Noise” in — 
Experiment I (——) and Experiments II and III Fic. 3. Average performance of the nine Ss in Ex- 
i (---). Lower curves are of “Quiet.” Over-all periment I during successive half hours of the experi- 
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Table 2 


Analysis of Variance for Experiment I 


Mean 


Source Square 


Subjects (S) 
Experimental conditions (E) 
EXS 
Clocks (C) 
Time at work (T) 
Txs 
EXC 
EXT 
Ss 
xT 
EXCXTXS 


Total 


* Significant at the .05 level. 
** Significant at the .01 level. 


for a simpler vigilance task. No explanation 
for this discrepancy will be attempted here; 
it is discussed in greater detail elsewhere 
(Jerison & Wing, 1957) and has been found 
again in a subsequent experiment with the 
same task (Jerison & Wallis, 1957). 


Experiment II: Noise and Complex Mental 
Counting 


The procedure in this experiment was developed 
as a result of a suggestion by Miles (1953) that Ss 


Fic. 4. The display and response panels of Ex- 
periment II. Behind the display is the loudspeaker 
cabinet. 


working in high energy noise fields could not keep 
an accurate count of how far they had gone in a 
repetitive task. The complex mental counting test 
is described in detail elsewhere (Jerison, 1955). 
Briefly, it consists of a display of three periodically 
flashing lights; the S’s task was to count the num- 
ber of times each light flashed and to maintain sepa- 
rate counts for each light. He responded by press- 
ing a button under a light when that light had 
flashed N times and began the count for that light 
again. (For this experiment N was always 10.) 
The display and response panels used in this experi- 
ment are illustrated in Fig. 4. Behind the display 
is the loudspeaker which broadcast the noise. Four- 
teen Ss were used. 
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Performance of the 14 Ss of Experiment II given separately for the seven-sub- 
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ject subgroups “QN” and “NQ” during successive half hours of the experimental and con- 


trol sessions. 
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The most relevant results of this experi- 
ment are presented in Fig. 5 which shows the 
average percentage of correct responses for 
the two subgroups separately for the second 
and third sessions. Subjects in subgroup QN 
showed no change in performance during 
successive half hours of the second (quiet 
throughout) session. In the third session, 
when the noise level was raised to 111.5 db 
after the first half hour, a small decrement 
appeared, though the performance curve is 
relatively flat. Subjects in subgroup NQ 
showed a steady decrement from their high 
performance level of the quiet first half hour 
of their second (experimental) session after 
the noise level was raised, with a total fall in 
performance of over 25%. In the third (con- 
trol) session in quiet this group repeated the 
pattern showing a drop in performance of 
about 20%. This general effect (the sessions 
by experimental conditions by time interac- 
tion) was significant at the .001 level. A 
summary of the rather lengthy analysis of 
variance for this experiment is presented in a 
more detailed report for limited circulation 
(Jerison, 1956). 

This result suggests that working on this 
tedious and difficult task for two hours under 
the QNNN regime conditioned Ss to a pro- 
gressive breakdown of performance, and this 
conditioning was maintained in the subse- 
quent quiet session. Working in quiet first, 
on the other hand, appeared to dispose the 
Ss toward maintaining their original perform- 
ance level, and this tendency, too, was main- 
tained in the subsequent session despite the 
presence of noise in that session. Recent ex- 
periments by Broadbent (1957, 1958) ap- 
pear to support this finding. 


Experiment III: Noise and Time Judgment 


While performing the counting task the Ss of Ex- 
periment II were also required to press a telegraph 
key (illustrated in Fig. 4, lower right) at what they 
judged to be 10-minute intervals. 


The main results of Experiment III are 
summarized in Fig. 6 which shows the aver- 
age time between S’s responses during suc- 
cessive half hours of the experimental and 
control sessions. (The subgroups were com- 
bined, because no order effect appeared here.) 
The results were analyzed with ¢ tests. The 
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Fic. 6. Time judgments for the experimental and 
control sessions of Experiment III during successive 
half hours. 


differences between half hours within the con- 
trol session were not statistically significant, 
nor was the difference between time judg- 
ments in the first half hour of the control, 
and experimental session significant. The 
difference between the first half hour and 
succeeding half hours of the experimental 
session were all significant at the .05 level or 
better, and the difference between the aver- 
aged judgments of the last one and one-half 
hours of the control and experimental sessions 
was significant at the .02 level. In other 
words, a significant difference was found be- 
tween time judgments as measured in this 
experiment when the comparison was _ be- 
tween judgments in noise and judgments in 
quiet. A more detailed report of this experi- 
ment for limited circulation has appeared else- 
where (Jerison & Smith, 1955). 


Discussion 


It is clear that noise produces readily meas- 
ureable changes in human performance. The 
specific changes involved in the three experi- 
ments described here are discussed in detail 
in each of the technical reports devoted to 
them (Jerison, 1956; Jerison & Smith, 1955; 
Jerison & Wing, 1957). The purpose of the 
present discussion is to consider these results 
in a more general way and to seek some con- 
stant features that appear in all of them. 
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One of the first problems to face is why 
it has been possible to demonstrate differ- 
ences between performance in noise and in 
quiet at all, for, as indicated earlier (cf. Kry- 
ter, 1950), most previous work on this prob- 
lem has given negative results. The main 
new feature that appears in these experiments 
is one suggested by Mackworth (1950) and 
by Broadbent (1953, 1954): Performance 
was measured over long time periods and 
conditions were arranged to allow effects of 
boredom and fatigue to interact with possible 
effects of noise. These conditions were pres- 
ent in all the experiments reported here. The 
implication is that for short, spurt-like efforts 
no performance decrements in noise need be 
expected. When sustained performance is re- 
quired, however, and the task is not intrin- 
sically challenging, effects of the sort reported 
here are likely. 

These considerations point to an interpre- 
tation of the results which deemphasizes the 
importance of noise. There is, after all, little 
reason for regarding noise as a peculiar kind 
of devil which produces such unusual inter- 
actions with fatigue and boredom. It seems 
reasonable, instead, to regard the more gross 


effects found as resulting from effects of noise 
on motivational level or emotional balance, 
in short, from noise as a source of psycho- 


logical stress. If this interpretation is correct 
we should expect similar behavioral effects 
from other experiments in which other kinds 
of stress or motivating conditions were inves- 
tigated. This is, in fact, the case. Mack- 
worth (1950) demonstrated that heat stress 
resulted in deterioration of performance on a 
simple vigilance task, and several experiments 
showing changes in the judgment of time 
intervals of the order of minutes as a result 
of different motivating conditions have been 
reported (Filer & Meals, 1949; Gulliksen, 
1927; Rosenzweig & Koht, 1933). 

Because stress has been introduced as an 
explanatory concept a few remarks on its 
scientific status are in order. The review by 
Lazarus, Deese, and Osler (1952) emphasizes 
the lack of systematic research on effects of 
stress on performance, and, although it at- 
tempts an analysis of theoretical approaches, 
this review does not go significantly beyond 
a statement relating psychological stress to 
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changes in motivation and emotion. There 
is danger, when using the concept of stress, 
of believing that an explanation has been 
achieved. Actually, here, and in most other 
contemporary usages of the term, we have 
achieved little more than communication of 
intuitive judgment about the kind of situation 
with which we are dealing. 

A final point that should be made is related 
to the kind of noise used. The noise was 
actually much softer than that found today 
in many operational situations. Yet even at 
these levels it was clear that “higher mental 
processes”’ were affected. It is obviously 
necessary to explore effects of noises of higher 
intensity on such processes. 


Summary 


The results of three experiments relating 
performance changes to noise levels are re- 
ported. Noise levels used were about 80 db 
representing “quiet” and 110 db representing 
“noise.” Changes in alertness as determined 
on a clock-watching task were found after 
one and one-half hours in noise though none 
were found in quiet. Time judgments—the 
estimation of the passage of 10-minute inter- 
vals—were distorted by noise; Ss responded 
on the average of every nine minutes in quiet 
and every seven minutes in noise when in- 
structed to respond at what they judged to 
be 10-minute intervals. A _ significant but 
complex effect of noise on a mental counting 
task was also found. These effects are dis- 
cussed in terms of noise as a source of psycho- 
logical stress. 


Received May 19, 1958. 
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CUES USED BY RATERS IN THE RATING OF 
TEMPERAMENT REQUIREMENTS OF JOBS* 


JEWELL BOLING anp SIDNEY A. FINE 


U. S. Employment Service 


This study was designed to determine if 
word and phrase cues in job definjtions could 
be standardized to achieve homogeneous con- 
cepts and interrater agreement in the rating 
of so-called “temperament” requirements of 
jobs. These ratings are part of a larger re- 
search project being carried out by the occu- 
pational research program of the national 
office of the United States Employment Serv- 
ice (Studdiford, 1953). In 1950 the USES 
began a research project designed to develop 
a new ocupational classification structure. 
This structure, it was felt, should reflect for 
jobs the common worker trait requirements, 
such as aptitudes, interests, and tempera- 
ments. The design of the over-all project 
required that judgments about such worker 
requirements be made basically from the job 
definitions in the Dictionary of Occupational 
Titles (U. S. Dept. of Labor, 1949). Four 
thousand jobs were used in the research (U.S. 
Dept. of Labor, 1956). The problem was how 
to infer temperament requirements from the 
descriptive information in the Dictionary 
using as a definition of temperaments “those 
personality qualities which remain fairly con- 
stant and which reveal a person’s intrinsic 
nature.” 

A series of studies was undertaken designed 
to establish an adequate basis for making 
such judgments. These studies, which turned 
out to be essentially semantic in nature, were 
carried out in three stages. The first stage 
involved an attempt to use the concepts of 
temperaments available in the literature and 
to develop the word and phrase cues in job 
definitions which were related to them. 
Seven raters applied these concepts to the 
rating of a 50-job sample. The second stage 
involved studies to determine the best we 
to formulate the factor concepts in terms 


1This study was carried out in 1950 and 1951. 
More data than are printed in this report are avail- 
able by writing the U. S. Employment Service, Di- 
vision of Placement Methods, Washington 25, D. C. 


the cues obtained in the first rating. In the 
third stage a second sample of 50 jobs was 
rated by 10 raters according to the revised 
formulations. 


First Stage 


The literature, primarily Cattell and All- 
port (Allport, 1943; Allport & Odbert, 1936; 
Cattell, 1946), yielded 14 traits, defined es- 
sentially in terms of the characteristics of 
people. Although most of the names of the 
factors were found in the literature, some were 
contrived. Since some of the factors appeared 
to be bipolar and this notion was supported 
by Cottle (1950), the 14 factors were ar- 
ranged in seven bipolar pairs and defined as 
below. 


Definitions of Temperaments—First Rating 


Adaptability to Routine vs. Versatility 
Dominance vs. Submissiveness 
Self-Control vs. Uninhibitedness 
Gregariousness vs. Self-Sufficiency 
Objectivity vs. Subjectivity 
Creativity vs. Non-Imaginativeness 
Rigorousness vs. Valuativeness 


Sample of Definitions 
with Illustrative Jobs by Title 
Self-Control: Disposition 


toward emotional con- 
trol necessary to main- 


Uninhibitedness : 
sition to act without 
restraint of emotions; 


Dispo- 


tain standard work per- 
formance con- 
fronted with critical, 
annoying or unusual 
situations. 


Surgeon 


inclination to react im- 
pulsively to varying situ- 
ations without attempt- 
ing to control excitement 
or tension. 


No illustrative jobs. 


‘ver 
man 


Fifty jobs were rated by seven raters ac- 
cording to these definitions. The raters were 
instructed to select two traits that together 
best expressed the temperament pattern of 
the job and to justify their ratings by indi- 
cating the cues in the definitions which 
prompted them. One factor, Uninhibitedness, 
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was ruled out as a possible rating, since it 
was felt that it did not occur as a require- 
ment in jobs. Thus 72 possible patterns could 
be rated. It was decided in advance to use 
the patterns on which the majority of the 
raters agreed as the criterion against which 
to measure agreement. Four or more raters 
agreed on a common pattern for 28 out of 
the 50 jobs. On only one out of the 50 were 
there no agreements of two or three raters. 


Second Stage 


The second stage began with the analysis 
of the word and phrase cues developed in the 
first stage. In order to more precisely deter- 
mine which cues were operating for each 
temperament trait, jobs were selected where 
(a) all raters saw the trait, (5) some did and 
some didn’t, and (c) only one saw it or didn’t 
see it. Raters in these three instances were 
asked by questionnaire and interview to again 
justify their ratings. In addition, cues were 
obtained from three new raters rating the 
same jobs. 

All of these cues were assembled for each 
trait. Typical of the results are the cues 
used for Dominance which reflect the double- 
barreled nature of the definition: “Disposition 
to prevail, control, be at the ‘helm’; desire 
for tasks involving planning, determining pro- 
cedures, directing and organizing activities 
and/or influencing or directing the actions of 
others by suggestion, persuasion, or com- 
mand.” 

Cues such as “influences emotion of audi- 
ence by singing” and “influences the actions 
of others by suggestion and _ persuasion 
through writing original descriptive advertis- 
ing copy” picked up the “influencing” part 
of the definition. Other cues such as “is in 
complete charge of stables” and “plans and 
organizes advertising activities” picked up the 
“control” part of the definition. 

Interviews with the raters and qualitative 
examination of their rating justifications pro- 
vided considerable insight into the judgmental 
processes which operated in the ratings. By 
providing the raters with definitions for tem- 
perament traits in terms of people and illus- 
trated by job titles, the rater had to reason 
from the definition of the trait to the illustra- 
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tive job titles, then to the specifics in the 
illustrative jobs, and finally to the specifics 
in the job being rated. To short-cut this 
involved process the rater in many cases sim- 
ply picked up the most obvious word or 
phrase cue in the trait definition and general- 
ized from it. The job titles used as illustra- 
tions were of little help since they did not 
specify what, in the definitions for those jobs, 
illustrated the trait. 

The second phase of the development of 
the temperament concepts got under way at 
this point. This involved setting out to define 
temperament concepts, not as traits in people, 
but as situations calling for those traits illus- 
trated with specific examples from job defini- 
tions. The language of these illustrative 
situations from the Dictionary of Occupa- 
tional Titles was revised to interpret the con- 
tent in such a way as to show how the tem- 
perament requirement was operating. To see 
if these illustrative situations containing the 
cues could now be consistently related to the 
revised, but simplified, temperament defini- 
tions, the First Matching Study of cue job 
situations was conducted. 

Seven occupational analysts with no pre- 
vious training or work in temperament analy- 
sis were given 13 simplified factor definitions 
with their trait names and 50 illustrative cue 
situations with the title of the job definition 
in which they occurred. Nine graduate stu- 
dents in a personnel psychology class were 
given the same factor definitions without 
names but identified by letters and the illus- 
trative cue situations without job titles as 
indicated below: 


Trait H—Situations involving performing 
adequately under stress when 
confronted with the critical or 
unexpected, taking risks, or hav- 
ing responsibility for the safety 
of others. 


Works below the surface of the water, 
dressed in diving suit and helmet, to drill 
holes in rock for blasting purposes at the bot- 
tom of lake, harbor, or other body of water. 
Risk of suffocation from fouled air hose or 
entrapment by rotten, falling timbers is al- 
ways present. 


ohh 
F 
om 
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Operates upon the human body, incising 
the flesh with very sharp bladed scalpels and 
using the fingers to manipulate organs and 
tissue. Exercises constant care, with no de- 
flection of attention regardless of distractions, 
to avoid any of several injuries or damages 
to the patient which would otherwise almost 
invariably occur. 


Observes passengers during flight to detect 
signs of discomfort, engaging nervous pas- 
sengers in conversation to allay their fears 
and apprehensions, setting an example of 
calm, untroubled demeanor. 


In both instances the instruction was to 
indicate for each cue situation the factor 
which best applied to it, and also to indicate, 
if necessary, second and third choices in rank 
order. 

The average percentage of agreement with 
the criterion was 79% for the occupational 
analysts and 69% for the students. How- 
ever, the Pearson product-moment correlation 
coefficient of .50 with a standard error of .11 
between the two groups of raters indicated 
that, although their over-all matching of the 
situations with definitions was comparable, 
they didn’t agree too well on the same fac- 
tors. Thus, the analysts saw Dominance in 
the cues that were supposed to illustrate it 
but not so the students. The reverse was true 
for Subjectivity with a mixed result in the 
cases of Rigorousness and Versatility. For 
example, one of the situations for Rigorous- 
ness was: 


Sculptor—“Making models and carving 
statues requires patience and painstaking 
endeavor in order to achieve a work of art 
with desired line and proportion.” 


Nevertheless, two analysts rated this situa- 
tion for Creativity because of the title and 
the fact that a Sculptor is an artist. One 
rated it for Self-Control because of the word 
“patience.” 

For another illustrative situation of Rigor- 
ousness: 


Surgeon—‘Because of the value placed on 
human life and responsibility residing in 
the surgeon, he must exercise utmost care 
in performance,” 
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three analysts rated Self-Control simply gen- 
eralizing from the title although another situa- 
tion taken from this same job was written up 
to be a criterion situation for Self-Control. 
It read as follows: 


Surgeon—‘In performance of surgery, 
worker is confronted with emergency and/ 
or critical situations, which require him to 
remain calm and collected; if not, the pa- 
tient’s life is endangered.” 


All the analysts aid students agreed on 
rating this item for Self-Control. 

This type of analysis revealed other diffi- 
culties with the situations. The factor Crea- 
tivity was defined thus: “Disposition to or- 
ganize feelings or knowledge into new images, 
systems, or practical constructions.” One of 
the situations used in the First Matching 
Study read: “Originality or the application of 
imagination important in developing new 
ways of expressing the dance created by an- 
other, or in creating a new dance.” This 
situation was supposed to illustrate the trait 
Creativity but some of the raters saw in it 
more strongly the trait Subjectivity: ‘“Dis- 
position to interpret phenomena in terms of 
personal viewpoint; desire for situations 
which necessitate or permit injection of self.” 
Follow-up on the cues which mediated their 
judgments revealed that the words “origi- 
nality” and “imagination” appearing in the 
situation for Creativity influenced a rating 
for Subjectivity rather than Creativity. 

Another example of how a word in a situa- 
tion illustrating a trait operated as a short-cut 
cue to another trait, bypassing the basic trait 
concept involved in the situation, was the 
word “research” in a situation for Creativity. 
It influenced a rating for Valuativeness. In 
a situation for Valuativeness, the word “data” 
influenced a rating for Objectivity, the defi- 
nition of which-contained the word “data.” 
The word “scientific” in a situation for Crea- 
tivity influenced a rating for Objectivity. A 
situation for Valuativeness, “Must be able 
to judge ‘temper’ of crowd,” led some raters 
to rate Gregariousness because of the word 
“crowd.” 

Several conclusions were drawn from this 
First Matching Study. 

First, certain classes of cues in particular 
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produced associative rather than analytical 
thinking. These were job titles, the names of 
traits, and the interpretations in the sample 
situations that were supposed to show the 
temperament requirement. This associative 
thinking did not help reliability. 

Second, the definition of a trait needed to 
be expressed through a range of situations as 
actually worded in job descriptions and thus 
illustrate the factor from as many aspects as 
possible; that is, although the wording of any 
one situation could be interpreted to reflect 
several traits, the concept of a particular trait 
had to be established in an over-all situational 
context. 

On the basis of these conclusions and study 
of the cues used by the raters in rating the 
first 50 jobs and the First Matching Study, 
the factors were once again redefined. This 
time they were defined as types of situations 
calling for traits (rather than as traits them- 
selves), and illustrated with a range of sam- 
ple situations, worded as in job descriptions. 
Thirteen factors emerged in the form origi- 
nally used with the graduate students men- 
tioned earlier, with the following changes: 
Creativity was merged with Subjectivity; 
Dominance was divided into Executiveness 
and Persuasiveness; Non-Imaginativeness was 
merged with Adaptability to Routine; and 
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Isolativeness was broken out of Self-Suffi- 
cency. However, it should be noted that 
these titles too were subsequently dropped for 
letter designations. 

The Second Matching Study was under- 
taken to see if these factor definitions and 
groups of situations could be matched when 
scrambled. Two groups of occupational ana- 
lysts were used for this purpose—10 experi- 
enced and four inexperienced in temperament 
analysis. 

Table 1 partially indicates the results of 
this study. With the exception of Self-Ade- 
quacy either nine out of 10 or all 10 expe- 
rienced analysts correctely related the group 
of situations to the factor definitions. Simi- 
larly, with the exception of Self-Adequacy 
and Versatility, three out of four and all 
four of the inexperienced analysts correctly 
related the group of situations to the factor 
definitions. With minor exceptions only the 
inexperienced analysts saw other than the 
criterion factor definitions as covering the 
groups of situations, but the definition and 
the cues that were responsible for this became 
apparent. For example, the definition for 
Self-Adequacy was rated as covering the 
groups of situations for Objectivity, Versa- 
tility, Executiveness, Isolativeness, Valuative- 
ness, and Subjectivity. Analysis showed that 


Agreements and Disagreements in Matching Groups of Situations with Temperament 


Experienced Analysts (V = 10) 


No. Seeing the 


Factors Criterion 
VARCH 9 
MVC 10 
DCP 9 
ISOL 10 
(Self-Adequacy )* 8 
REPSC 9 
USI 10 
PUS 10 
SJC 10 
INFLU 10 


® Dropped. 


Definitions for Two Groups of Raters 


No. Not Seeing 


Inexperienced Analysts (V = 4) 


No. Seeing the No. Not Seeing 


the Criterion Criterion the Criterion 
1 2 2 
0 4 
1 4 
0 4 
2 1 
1 4 
0 4 
0 4 
0 3 
0 4 
0 4 
0 4 
0 4 


ee 
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DEPL 10 
STS 10 
: FIF 10 
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this was indeed an overlapping factor that 
had too wide an application; hence Self-Ade- 
quacy was dropped as a factor. Ratings for 
Executiveness and Persuasiveness showed up 
for a group of situations that were supposed 
to go with Gregariousness, but in this case 
situations could be added and subtracted from 
the group to improve its homogeneity. 

For example, evaluation of situations in 
Gregariousness (DEPL) resulted in additions 
and subtractions to reduce overlap with other 
factors. The definition of Gregariousness was 
as follows: “Situations involving the necessity 
of dealing with people in actual job duties 
beyond giving and receiving instructions.” 
One situation for this had read: “Endeavors 
to sell gas-powered equipment to existing and 
prospective industrial and commercial cus- 
tomers to increase use of gas in territory.” 

This situation tended to function as a cue 
for Persuasiveness. The following situation 
was substituted because it functioned as a 
cue for Gregariousness only. “Makes appoint- 
ments for employer with clients or customers 
by mail, phone, or in person.” 

An example of a situation retained as nearly 
always providing stronger cues for Gregarious- 
ness than any other factor was this: “Pro- 
motes sales and creates good will for his firm’s 
products by preparing displays, touring the 
country, making speeches at retail dealers’ 
conventions, and calling on individual mer- 
chants to advise on ways and means for 
increasing sales.” 

Similar analysis resulted in the 12 defini- 
tions and groups that were used to express 
the temperament concepts in their final form. 
This form was essentially the same as pre- 
viously illustrated for Trait H, except that 
(a) numbers were substituted for letters and 
(b) verbal symbols made up of the initial 
letters of the key words in the definition were 
also added for identification. Following is a 
list of the definitions and their numerical and 
verbal designations: 


1. VARCH—Situations involving a variety 
of duties often characterized by frequent 
change. 

2. REPSC—Situations involving repetitive 
or short cycle operations carried out according 
to set procedures or sequences. 
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3. USI—Situations involving doing things 
only under specific instruction, allowing little 
or no room for independent action or judg- 
ment in working out job problems. 

4. DCP—Situations involving the direc- 
tion, control, and planning of an entire ac- 
tivity or the activities of others. 

5. DEPL—Situations involving the neces- 
sity of dealing with people in actual job du- 
ties beyond giving and receiving instructions. 

6. ISOL—Situations involving working 
alone and apart in physical isolation from 
others, although activity may be integrated 
with that of others. 

7. INFLU—Situations involving influenc- 
ing people in their opinions, attitudes, or 
judgments about ideas or things. 

8. PUS—Situations involving performing 
adequately under stress when confronted with 
the critical or unexpected or taking risks. 

9. SJC—Situations involving the evaluation 
(arriving at generalizations, judgments, or 
decisions) of information against sensory or 
judgmental criteria. 

0. MVC—Situations involving the evalua- 
tion (arriving at generalizations, judgments, 
or decisions) of information against meas- 
urable or verifiable criteria. 

X. FIF—Situations involving the interpre- 
tation of feelings, ideas, or facts in terms of 
personal viewpoint. 

Y. STS—Situations involving the precise 
attainment of set limits, tolerances, or stand- 
ards. 


Third Stage 


The 12 revised trait definitions were applied 
to the rating of the second sample of 50 jobs, 
Ten raters participated. The instructions 
were to select the two factors that together 
best expressed the temperament pattern for 
the job. Here again the rater had to justify 
his pattern, i.e., indicate the cues which led 
him to his conclusions. 

In this final rating, 30 out of the 50 jobs 
had five or more raters agreeing on the tem- 
perament patterns. All the other jobs had 
two to four raters agreeing on patterns. This 
compares with 28 out of 50 jobs on which 
there were miajority agreements in the initial 
rating. However, these latter agreements in- 
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Table 2 


Mean Occurrence Out of 100 Ratings of Temperament Factors in the Rating of 
Two Groups of 50 Jobs 


Temperament Factors* 


Initial Rating 


Mean Occurrence* 


Final 
Rating» 


Initial 
Rating 


Non-Imaginativeness 


Rigorousness STS 
Adaptability to Routine REPSC 
Versatility VARCH 
Submissiveness USI 
Objectivity MVC 
Dominance DCP 
Creativity 

Gregariousness DEPL 
Subjectivity FIF 
Persuasiveness INFLU 
Valuativeness SJC 
Self-Control PUS 
Self-Sufficiency 

Isolativeness 


ISOL 


25.1 
16.4 19.1 
6.5 11.7 
4.0 11.1 
6.0 9.4 
2.0 9.3 
4.0 
4.0 3.0 
2.0 2.9 
1.1 
1.0 5.5 
1.0 1.0 
6.5 
8 


* Blanks under Initial and Final Rating indicate that these factors were not involved as such in the respective ratings. 
> Mean Occurrences of Final Rating appear adjacent to their nearest equivalent in the Initial Rating, although they are 


changed as explained in the text. 


volved 10 rather than seven raters, a more 
difficult situation in which to get majority 
agreement. 

Table 2 shows the greater spread in the 
mean occurrence of the factors in the final 
rating over the initial rating. This wider use 
of the factors suggests a greater understand- 
ing of the factors in relation to the job in- 
formation available and a more discriminating 
use of them in making judgments. 

A comparison of the cues obtained for 
DCP (Direction, Control, Planning), i.e., 
“Situations involving the direction, control, 
and planning of an entire activity or the 
activity of others” with those obtained for 
the predecessor factor Dominance as set out 
previously indicates how the qualitative re- 
sults improved. Inferences that the trait is 
operating are now justified on the basis of 
homogeneous word and phrase cues such as 
these: “individual performance permits self- 
direction,” “is in complete charge of stables,” 
“is pretty much on his own,” “carries out 
activity without supervision,” “coordinates 
the operation,’ “determines procedures,” 
“plans and carries out an entire activity.” 


Discussion 


The criterion used for expressing agreement 
was not very satisfactory. It penalized sig- 
nificant agreements. The cues obtained for 
Sugarcane Planter illustrate the point. Agree- 
ments on a pattern of two traits for the 10 
raters ran thiswise 3, 3, 1, 1, 1, 1. The pat- 
terns obtained were as follows: 


DCP—VARCH 3 raters DCP—MVC 1 rater 
DCP—SJC 3 raters VARCH—MVC 1 rater 
DCP—DEPL rater VARCH—STS Irater 


It is evident from these patterns that eight 
of the raters agree on DCP and five on 
VARCH. Moreover, substantially the same 
cues were operating for the choices. Note 
the cues given as justification by the five 
raters who saw VARCH as part of the pat- 
tern: “many different tasks carried on . . 
different tools and equipment,” “variety of 
work carried out,” “variety of duties in- 
volved,” “grows, plants, cultivates, harvests, 
markets,” “plants and cultivates . . . cuts and 
hauls . . . engages seasonal labor.” 

This same situation was true for practically 
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all jobs not having a majority pattern in the 
second rating of 50 jobs. 

Recently, Wherry (1957) in discussing the 
future of criterion research stressed the need 
for “measured interest in the field of job and 
situational analysis techniques including a 
still better definition of the needed elements 
and of methods of estimating their presence 
and importance of both criteria and tests.” 
It may be that the procedure and resultant 
factors outlined here suggest an improved job 
analysis instrument for making the situational 
analysis discussed by Wherry. At any rate 
it seems likely that we must arrive at a more 
effective understanding of the role of language 
as a mediating element in our attempt to get 
at criteria. 


Summary 


A method of rating temperament require- 
ments of jobs on the basis of information 
obtained from written job descriptions was 
developed in order to reflect temperament 
information in a functional occupational clas- 
sification structure. The early procedure was 
to adapt from the literature clinical concepts 
of temperament traits as they occur in people. 
Tryout of this procedure produced associative 
rather than analytical thinking and did not 
achieve reliability. Through a series of stud- 
ies of the shared and unique word and phrase 
cues which led to raters’ inferences about 
temperament requirements, concepts of tem- 
peraments were formulated not as traits in 
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people but as situations in jobs requiring com- 
mon adjustments of workers. These concepts 
were defined by an over-all situational con- 
text rather than as clinical concepts of the 
traits themselves. Greatly improved relia- 
bility was obtained. It was suggested that 
defining ‘“‘temperaments” in terms of the kinds 
of situations to which workers must adjust 
may be an effective first step toward a more 
adequate criterion for measuring personality 
concomitants of successful job adjustment. 


Received June 23, 1958. 
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A major problem in any influence relation- 
ship is to determine the degree of pressure 
to be used. Until some 10 or 15 years ago, 
salesmen were usually thought of as using 
“fast words” and pretentious claims and doing 
much “pushing” to constitute what was 
known as “high-pressure” methods of selling. 
As business became more “professional” and 
salesmanship along with it, there emerged a 
recognition that there was a hidden resource 
in the sales prospect himself. The prospect 
liked to buy from the salesman who gave him 
a chance to trust and respect him, something 
which occurred only if the prospect felt that 
he was being permitted to consider the sales- 
man’s proposition fairly and rationally. In 
1947, Bursk (1956) labeled this new sales 
technique “low-pressure selling.” Since then, 
“low-pressure selling” has enjoyed consider- 
able vogue. Recently, however, Bursk (1956) 
and others have been bemoaning the fact that 
what was “low-pressure selling” has deterio- 
rated into “no-pressure” selling. 

Even earlier than this development in sell- 
ing, counselors and psychotherapists were dis- 
covering resources in their clients essentially 
similar to those observed by the “low-pres- 
sure” salesman. In 1942, Carl Rogers (1942) 
published his formulation of counseling which 
has been labeled variously as “client-centered” 
and “nondirective.” A description of the 
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Rogerian technique is psychologically very 
much like Bursk’s description of “low-pressure 
selling.” Similarly, in the hands of many 
poorly equipped practitioners, Rogers’ tech- 
nique may be said to have drifted into a truly 
“nondirective” method. 

Even earlier, Progressive Education had 
given recognition to these resources in learners 
and attempted to devise programs of educa- 
tion which would capitalize upon them. It 
too, in the hands of ill-trained teachers, 
drifted into a rather laissez-faire approach. 
Apparently low-pressure selling, client-cen- 
tered counseling, and Progressive Education 
all require personnel of high caliber and ade- 
quate training. They all require people who 
are sensitive to the needs of others, believe in 
these inner resources of others, and are will- 
ing to fulfill their roles as leaders and re- 
source persons without resort to trickery. 

Three factors suggested a need for a re- 
evaluation of the “no-pressure” technique. 
One was Bursk’s recent discussion (1956) of 
“no-pressure” techniques as a kind of “sign 
of the times” and occasional observations that 
there may be developing a “reaction forma- 
tion” to such techniques. The second came 
from complaints of many observers that edu- 
cation has become “too soft” and needs to 
stiffen curricula and discipline. There have 
also been cries that counselors are not “per- 
suasive” enough particularly in their educa- 
tional and vocational counseling of abler stu- 
dents. The third came from previous research 
(Torrance & Mason, 1956), in which it was 
found that men who perceived their instruc- 
tors as making no effort to influence them 
behaved more in line with the intended effort 
to influence than those who felt that their 
instructors were trying to influence them. 
The finding, however, was a post hoc one and 
required more definitive exploration. 


Procedure 


The setting of the experiment was the simulated 
nine-day survival exercise of the USAF Survival 
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Training School. Instructors of groups of six to 
12 men attempted to influence trainees to react 
favorably to an emergency ration known as “pem- 
mican.” The Ss were 427 aircrewmen undergoing 
survival training. All Ss received a double issue of 
the emergency ration including a total of eight 
meat bars. 

A total of 43 instructors in two successive classes 
were involved. Prior to the exercise, training groups 
were divided randomly into one control and six 
experimental groups. In small-group sessions, the 
author and two experienced colleagues trained the 
instructors in the experimental technique to which 
they had been assigned randomly. These techniques 
were designed to represent various hypothesized 
degrees of influence. 

In Experimental 1, instructors were briefed to 
make no effort whatever to influence trainees to ac- 
cept the ration. In Experimental 2, they were in- 
structed to make no effort to influence the accepta- 
bility of the ration except by eating the ration 
themselves and “setting a good example.” Those 
in Experimental 3 were asked to give information 
about the value of the meat bar as an emergency 
ration and about ways of preparing it in an objec- 
tive, “take-it-or-leave-it” manner. Those in Experi- 
mental 4 in addition were asked to emphasize to the 
group the psychological factors (group explanation) 
which affect acceptability. Instructors in Experi- 
mental 5 were given the same instructions as those 
in Experimental 4 except that they were to give 
their explanations to individuals (individual explana- 
tion) rather than to the group. In Experimental 6, 
instructors were briefed to use what we considered 
the coercive method of informing trainees that food 
indoctrination was an integral part of their training 
and that they would be “graded down” if they did 
not “really” try the ration. 

Although the hypothesized degree of pressure was 
in the order listed, questions were asked at the end 
of training so that the experimental techniques could 
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be reordered according to degree of pressure per- 
ceived by trainees. First, they were asked outright 
to indicate the degree to which their instructor had 
tried to influence them. Second, they were asked 
to indicate which of several influence techniques 
their instructors had used. This list included: de- 
scribed preparation, demonstrated preparation, ate 
ration, gave nutritional facts, explained reasons for 
using in training, told about psychological effects, 
told use of ration would count on grade, and the 
like. 

At the end of training, the following four types 
of acceptability indicators were obtained from each 
S: (a) the traditional hedonic scale (seven-point), 
requiring the S to indicate his reactions to each of 
five methods of preparing the meat bar; (b) the 
number of bars eaten; (c) reasons for not eating 
the remainder (made me sick, too greasy, etc.) ; and 
(d) the conditions under which the S would use 
the ration in the future. 

To determine the perceived pressure of each tech- 
nique, a weight of “1” was assigned when an S re- 
ported that his instructor had made no effort to in- 
fluence him; “2” when some effort to influence was 
perceived; and “3” when very much influence was 
reported. An analysis was also made to determine 
what specific types of influence acts were perceived 
by Ss under the “no-influence” condition. Three 
categories of perceived pressure were established by 
placing: in the “no-pressure” category those who 
checked none of the influence acts as performed by 
their instructor, in the “low-pressure” category those 
who checked from one to five types of instructor 
influence acts, and in the “high-pressure” category 
those who checked six or more types of instructor 
influence acts. 


Results 


First, an effort was made to obtain a pic- 
ture of precisely what instructors who were 


Table 1 


Number and Percentage of Trainees and Instructors Perceiving Various Instructor 
Influence Acts Under ‘‘No-Pressure” Condition 


Influence Act 


Trainee Perception Instructor Perception 


No. % No. 


Described preparation 

Demonstrated preparation 

Ate meat bar during training 

Gave nutritional facts 

Explained reasons for use in training 
Advised eating small bits 

Advised not to eat when overly fatigued 
Advised to eat before becoming too hungry 
Told about psychological factors involved 
Told use of ration would count on grade 


49 
18 
61 


Note.—N of trainees = 57; N of instructors = 7. 
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Table 2 


Percentage Perceiving Various Degrees of Instructor Effort to Influence Under 


Seven Conditions and Pressure Indexes 


Percentages 


Condition 


None 


Rank in 
Pressure 


Pressure 


Some Much Index 


Control 

Exp. 1 (No Infl.) 57 41 
Exp. 2 (Good Ex.) 63 14 
Exp. 3 (Info.) 62 47 
Exp. 4 (Grp. Expl.) 61 21 
Exp. 5 (Ind. Expl.) 65 15 
Exp. 6 (Evaluation) 43" 12 


55 16 187 3 
55 4 159 2 
68 18 204 6 
48 5 158 1 
63 16 195 4 
49 36 221 7 
74 14 202 5 


Note.—Chi-square (based on numbers perceiving various degrees of pressure) = 56.7056; df = 8; p <. 


* Number of Ss in Exp. 6 was reduced by eliminating one crew whose instructor was replaced by an ssabricfed instructor in 


the field. 


coached to make “no effort” to influence were 
perceived to do. Table 1 presents the num- 
ber and percentage of Ss who perceived these 
instructors as performing each of 10 types of 
influence acts listed. The instructors’ self 
reports are also shown in Table 1. From 
these results it can be concluded that it is 
difficult for an instructor to deny his role as 
influencer. Although instructors assigned the 
“no-influence” technique reported fewer influ- 
ence acts than those assigned any other con- 
dition, they ranked fourth in seven in the 
proportionate influence acts perceived by 
trainees. 

Second, to determine the relative degree of 
pressure perceived for each condition, per- 
centages were first determined for each of 


Then 
weightings were applied to yield comparable 


the three degrees of perceived effort. 


indexes. These data are presented in Table 
2. The over-all chi square, using the raw 
data, is 56.71 (df = 8; p < .001), indicating 
that significant effects occur among the con- 
ditions. Roughly speaking, Experimental III 
(Giving Information) and Experimental I 
(No Effort) come nearest qualifying as “no- 
pressure” methods. The Controls, Experi- 
mental IV (Group Explanation), and Experi- 
mental VI (Evaluation) may be categorized 
as “low-pressure” methods, while Experimen- 
tal V (Individual Explanation) and Experi- 
mental II (Good Example) would qualify as 
“high-pressure conditions. 

Finally, the effects of the seven conditions 


Table 3 


Means and Standard Deviations of Hedonic Ratings and Number of Meat Bars Consumed and Percentages 
Made Sick and Intending to Use Meat Bar in Future for Seven Conditions 


Hedonic Rtg. 


Bars Consumed 


Made Sick 


Eat in Fut. 


Condition Mean SD* 


Mean 


SD» 


No. 


Pctg.* No. Pctg.4 


Control 3. 8 10.5 38.2 
Exp. 1 (No Infl.) 21.61 6.88 5.66 3.07 15 26.3 24 42.1 
Exp. 2 (Good Ex.) 23.84 6.58 5.66 2.34 16 25.4 18 28.6 
Exp. 3 (Info.) 19.63 7.46 7.95 4.00 11 17.7 38 61.3 
Exp. 4 (Grp. Expl.) 19.21 5.90 6.75 2.94 17 27.9 42 52.5 
Exp. 5 (Ind. Expl.) 23.15 6.74 5.57 5.28 19 29.2 17 26.2 
Exp. 6 (Evaluation) 18.09 6.15 7.79 1.17 3 7.0 24 55.8 


® Conditions for homogeneity of variance satisfz ew met. 


F ratio (between groups to within groups) = 5.45, p < .01. 


b Using Bartlett's Test, requirements for homogeneity of variance not satisfied (p < .001). 


within groups) = 4.67, p < .01. 
¢ Chi square = 16.759; df = 6; p < .02. 
4 Chi square = 27.227; df = 6; p < .01. 


F between highest and eens variance = 1,608, not significant. 


F ratio (between groups to 
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Table 4 


Means and Standard Deviations of Hedonic Ratings and Number of Meat Bars Consumed and Percentages 
Made Sick and Intending to Use Meat Bar in Future for Each of 
Three Degrees of Perceived Pressure 


Hedonic Rie 


Degree of Pressure 


Made 
Sick 


Bars Eaten Eat in 


(No. Types of Infl. Acts) 


Mean 


SD* Mean SD» % 


18.32 
20.94 
23.28 


No Pressure (0 acts) 
Low Pressure (1-5) 
High Pressure (6 or more) 


for homogeneity of variance satisfied. 
F ratio (between groups to within groups) = 6.88, p < .01. 


> Using Bartlett's Test, requirements for homogeneity of variance not met (p < .001). 


groups) = 7.28, p < .OO1. 
¢ Chi square = 1.80; “af = 2, not 7 
4 Chi square = 15. 43; df =2,p < .001. 


are examined in Table 3 which presents the 
means and standard deviations of the hedonic 
ratings and number of meat bars consumed 
and the numbers and percentages “made sick” 
and “willing to eat the ration in the future” 
for each condition. Bartlett’s Test indicates 
that conditions of homogeneity of variance 
are satisfied for the hedonic ratings but not 
for number of bars consumed. Nevertheless,? 
analyses of variance were made for both sets 
of data and in both cases the F ratios are 
significant at better than the .01 level. Thus, 
it will be observed that on all four criteria 
the effects due to experimental conditions are 
statistically significant, since the chi squares 
for both “made sick” and “eat in future” are 
significant at better than the .02 and .01 
levels, respectively. On all four criteria, it 
will be observed that the “no-influence” tech- 
nique along with the control conditions occu- 
pies a median position in terms of effective- 
ness. The evaluation, information, and group 
explanation conditions tend to be accom- 
panied by more favorable reactions, while the 
good-example and individual-explanation con- 
ditions tend to produce boomerang effects. 
Thus, the two conditions occupying median 
positions of effectiveness were perceived by 
trainees as being low in pressure. Two of the 
techniques producing best results occupy me- 


2 The Norton study and others (Rogers, 1942) in- 
dicate that failure _to satisfy conditions of homo- 
geneity of variance is not as serious as it was once 
regarded. It is generally suggested, however, that a 
higher level of significance be required than normally 
and this has been done here. 


7.39 5.20 
6.87 


6.67 


8.17 13.9 
21.5 


23.4 


F ratio haneiied highest and lowest variance = 1.23, not significant. 


F ratio (between groups to within 


dian positions on the basis of perceived in- 
structor pressure and the two conditions pro- 
ducing boomerang effects are rated highest 
in instructor pressure. 

When direct tests are applied, results ob- 
tained under the “no-pressure” condition and 
using hedonic ratings as the criterion are sig- 
nificantly better than the most effective con- 
dition (CR = 2.687, p< .01) and poorer 
than the least effective one (CR = 1,813, p 
= .06). Using the number of bars consumed 
as the criterion, the difference between the 
“no-pressure” condition and the most effec- 
tive condition is significant (CR = 3.82, cor- 
rected for variance, p < .01) but not between 
the “no-pressure” condition and the least ef- 
fective condition. Results based on “made 
sick” are the same as for “bars eaten” (chi 
square = 6.2105, p= .01) and those for “eat 
in the future” are the same as for hedonic 
ratings (chi squares = 4.3799 and 3.1793, 
respectively, for “no pressure” versus most 
and least effective). 

Table 4 presents a comparison of the ac- 
ceptability of the meat bar to those classed 
in the no-pressure, low-pressure, and high- 
pressure categories on the basis of number of 
kinds of instructor influence-acts. It will be 
observed that on this basis there is a consist- 
ent tendency for all of the indexes of accept- 
ance to vary inversely with degree of pres- 
sure as defined here. The differences are all 
statistically significant except for percentage 
“made sick.” It should be noted from Table 4 
that the “no-pressure” group is unusually 
variable on the number of bars consumed. 
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Discussion 


Apparently it is difficult for anyone cast in 
a social role such as instructor to avoid influ- 
encing or being perceived as influencing role 
partners. Whatever he does, whether so in- 
tended, tends to be interpreted as an effort to 
influence. This is probably true not only in 
the teacher-student relationship, but in such 
relationships as counselor-counselee, physi- 
cian-patient, parent-child, salesman-client, su- 
pervisor-worker, and the like. 

It appears that members of training groups 
expect instructors to exercise influence toward 
the group as a whole and influence acts so di- 
rected tend not to be interpreted as “high 
pressure.” Apparently such pressures tend 
to be regarded as legitimate. When “personal 
influence” is introduced and individuals are 
singled out for persuasive efforts or attention 
of any kind, these individuals tend to feel that 
they are being “high pressured.” A number 
of rationales for this phenomenon might be 
advanced. In the face of influence efforts, 
individuals may perceive the group as a pro- 
tection. A student singled out and ap- 
proached individually by an instructor, feels 
stripped of his defenses (the group) and as 
a result stiffens his resistance. 

The data suggest that pressures up to a 
point, particularly if they are perceived as 
legitimate in terms of the influencer’s official 
role, are effective. Beyond a certain point or 
outside the limits of what is regarded as legiti- 
mate, however, resistance is stiffened and at- 
tempts to influence tend to boomerang. 

The data concerning variability are of spe- 
cial interest. Perception of “no pressure” 
seems to be accompanied by especially erratic 
effects. This is also true of the generally 
effective technique of giving objective infor- 
mation in a matter-of-fact manner. It is in 
this respect that the giving-information and 
evaluation conditions differ most in their ef- 
fects. This emphasizes the difficulty of gen- 
eralizing too broadly concerning influence 
techniques. The method of influence that 
may be most effective may depend upon the 
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salesman and the prospect, the counselor and 
the counselee, and the like. 

Another difficulty in generalizing broadly 
from this study is that there are elements in 
the situation studied which are probably dif- 
ferent psychologically from conditions in per- 
sonal selling, counseling, and the like. It is 
believed, however, that many of the influence 
dynamics are essentially similar. 


Summary 


In this study, an effort was made to test 
experimentally the relative effectiveness of 
varying degrees of pressure exerted by in- 
structors in indoctrinating aircrewmen con- 
cerning an emergency ration known as “pem- 
mican.” The Ss were 427 aircrewmen com- 
posing 43 small training groups randomly 
assigned to one control and six experimental 
groups. Subjects were issued eight of the 
meat bars for use during the nine-day simu- 
lated survival experience. Criteria of accept- 
ance were obtained at the end of training 
along with measures of perceived instructor 
effort to influence. It was found that instruc- 
tors were relatively unsuccessful in exercising 
“no influence” insofar as trainee perceptions 
are concerned. When the seven conditions 
were arranged in order of perceived instruc- 
tor pressure, it was found that pressure up to 
a certain point appears to be accompanied 
by increased acceptability and that beyond 
this point influence efforts operate in an in- 
verse direction to that intended. Those who 
perceive “no effort” to influence them, tend 
to react most favorably. 


Received July 1, 1958. 
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PREFERENCES FOR LETTERS OF THE ALPHABET 


MICHAEL MECHERIKOFF 
Westmont College 
anp DAVID L. HORTON 


University of Minnesota 


This study is an attempt to answer the 
questions: “Do consistent letter preferences 
exist?” and “If preferences exist. which let- 
ters may be considered approximately equal 
to each other in appeal?” Interest in these 
questions arose when a large foo! processing 
company found that the same product pack- 
aged in containers differing only in label 
yielded large and significant preferences. 

A direct attack on the question of whether 
or not letter preferences exist does not ap- 
pear to have been made, although many stud- 
ies have been done to investigate preferences 
in other areas. Closest to the present prob- 
lem are studies of number preferences, e.g., 
Yule (Chapanis, Garner, & Morgan, 1949; 
Yule, 1927) and a study by Forer (1940) 
which deal with preferences for sounds of 
consonants. Although studies of this kind 
lead one to suspect the existence of letter 
preferences of the type in which we are in- 
terested, they are of no value in assessing 
which letters are preferred in situations simi- 
lar to that of the food processor, or in deter- 
mining which letters may be considered equal 
for purposes such as this. 


Method 


The method of paired comparisons, although one 
of the most adequate methods for securing judg- 
ments of preferences, presents an enormous task to 
the Ss if very many objects are to be compared. To 
cut down the number of letters to be tested for 
equality of preference two preliminary studies were 
done, and seven letters which showed the least evi- 
dence of preferences were then used in paired com- 
parisons. 

In the first study the Ss were told a story about a 
mythical community in which the inhabitants named 
their children after letters of the alphabet. The Ss 
were asked to “name a new baby” by indicating the 
five letters they thought most appropriate, the five 
next most appropriate, the five least appropriate, the 
five next least appropriate, and the six remaining. 
For each letter a graph was made showing its fre- 
quency in each of the five categories. 
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In the second study, 80 students in an advanced 
psychology course were asked to rank order the 
alphabet. Half of them were asked to rank accord- 
ing to the way the letter looked, the other according 
to the way the letter sounded. The degree of con- 
cordance among students on each of these tasks was 
low. The rank difference correlation between the 
mean ranks on the two lists was .50, which indicates 
that there is only moderate agreement when different 
criteria for judging are used. 

The seven letters for paired comparison were se- 
lected as follows: 

1. Ii « letter appeared among the middle 10 let- 
ters of both the “sound” and “looks” lists, it was 
included. This gave T, P, N, and K, and all except 
K appeared neutral from the first preliminary study 
as well. 

2. The letters S, V, and G were neutral on both 
the first preliminary study and the “sound” list and 
so were included. 

The seven letters gave 21 pairwise combinations, 
which were mimeographed in capitals on sheets of 
84 X 54 paper. S’s task was to go down the list 
rapidly and in each pair circle the letter he pre- 
ferred. This list of pairs was randomized, and some 
changes were made in the random order so as to 
avoid the appearance of the same letter in succes- 
sive pairs as much as possible, and so that each let- 
ter appeared in right and left positions an equal 
number of times. Eight forms of the basic list were 
used to control three other variables. 

1. Straight-Reversed. To control for preferential 
treatment of the beginning or end of the list, the 
order of the entire list was reversed for one-half of 
the Ss. 

2. AB-BA. To control for right-left preferences, 
the position of the two letters was reversed half of 
the time. 

3. Split-Nonsplit. To equalize the differential 
treatment of pairs appearing at the extremes of the 
list versus the middle, the relative position of the 
two halves was changed for half of the cases. 

The eight forms were distributed to the S’s ran- 
domly. The 182 (138 males, 44 females) Ss were 
students taking an introductory psychology course 
at the University of Minnesota. 


Results 


In order to determine whether letters differ 
in preference, significance tests were performed 
both for letters within pairs, and for each 
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Table 1 


Proportion of Individuals Choosing Letter on Left 
When Paired with Letter on Top 


G K N P Ss T 
K 
N .58* 
58* 54 54 
S 
55 45 52 .42* 53 


* Significant at the 5% level. 
** Significant at the 1% level. 


letter when considered across all the pairs in 
which it appeared. Table 1 and Table 2 pre- 
sent these data. In both cases the deviation 
of the observed proportion from the hypothe- 
sized value of .50 was tested for significance, 
six of the pairs showing preferences at the 
5% level. When tested irrespective of any 
particular pairing, one letter (S) was sig- 
nificantly preferred at the 1% level, and one 
(K) was significantly nonpreferred at the 5% 
level. It should be noted that the test for 
significance uses an N of 1092 providing the 
measures are independent. Since the 1092 
measures were not independent but appeared 
in related groups of six, NW = 182 was used to 
test significance, which is conservative. The 
results indicate that there are stronger pref- 
erences for some letters than for others, at 
least when the letters are presented in pairs. 

Since some preferences are revealed in this 
analysis and since the letters used here were 
expected to yield the smallest differences, it 


Table 2 


Ratio of Number of Times Each Letter Was Selected to 
the Possible Times it Could be Selected 


Letter Ratio 
V 50 
A4 
G A5 
K A1* 
S .64** 
N 56 
P 49 


* Significant at the 5% level. 
** Significant at the 1% level. 


Preferences for Letters of the. Alphabet 


115 


seems likely that the other letters of the al- 
phabet would yield differences as great or 
greater (although this does not preclude find- 
ing pairs of equally liked or equally disliked 
letters). 

Analyzing for sex differences and positional 
preference, the pair NK showed a significant 
sex difference at the 1% level, and the pairs 
KP, VK, and KT were significant at the 5% 
level. The letter K was preferred by more 
women than men, and this female preference 
for K holds in the other two pairs involving 
K, although it is not statistically significant. 
Preferences of this variety would be of im- 
portance when only one sex is judging the 
product. No significant positional prefer- 
ences were discovered. 


Selecting Equal Pairs 


In problems where one is interested in de- 
tecting whether or not a difference really 
exists, the significance level is of central im- 
portance. But if one wishes to establish with 
a high probability that the difference is not 
larger than a certain amount (which is deter- 
mined by practical considerations), then the 
power of the test is of primary concern. The 
method of computing power is described by 
Walker and Lev (1953). 

For the present study it was decided that 
if the proportion of Ss preferring one letter 
over another fell between .45 and .55, these 
letters would be considered equal in appeal. 
With sample size and limits of allowable de- 
viation fixed;-the only way to increase the 
power is to change the significance level. 

Figure 1 indicates the relationship between 
power and significance level when N = 182 
and the allowable deviation from p = .50 is 
0S. 

For this study it was decided that the 
power of the test should be approximately 
80%. The power was computed at the point 
p= AS and p= .55 (the power is the same 
at both of these points because of symmetry). 
If the true proportion is greater than .55 or 
less than .45, the power will be greater than 
it is at these points, and detection of the 
falsity of the null hypothesis will be even 
more probable. 

The question of equality of letters in pairs 
can now be considered specifically. When the 


he 
vee 
i 
Be 
ati 
ap 
Ai 
i 
: 
‘ 
ead f 
at 


Michael Mecherikof and David L. Horton 


20 40 60 80 109 
Power 


(per cent) 


Fic. 1. Significance level versus power of the test 
that p= .50, when N = 182, and the true proportion 
in the population is 45 or .55. 


two pairs with the smallest differences, GT 
and VP, are considered, the resulting signifi- 
cance level is 66%, which corresponds to a 
power of approximately 85%. Thus, with 
these two pairs it can be said with a rather 
high degree of certainty that the letters are 
almost equal in preference. If the inclusion 
of more pairs is desired, the power of the test 
must suffer; however, with inclusion of two 
additional pairs, VK and TV, the power still 
remains about 75%. It appears then that 
some pairs of letters do not show much dif- 
ference in preference value, and thus could 
be used in brand preference tests. The num- 
ber of such pairs that could be used will de- 
pend on how certain one desires to be that 
real preferences do not exist, and how much 
deviation from an even split will be tolerated. 


Summary and Conclusions 


To determine whether or not consistent 
preferences for letters of the alphabet exist 
in the populations, and to identify pairs of 


letters which have equal preference value, 
seven letters were presented pairwise in all 
possible combinations to 182 students (138 
males, 44 females) at the University of Min- 
nesota. Only seven letters were used in or- 
der to reduce the Ss’ task, these seven being 
chosen on the basis of two preliminary studies 
as having the least likelihood of being differ- 
ent from each other in appeal. By lowering 
the significance level of the statistical test, a 
few pairs can be found for which the prob- 
ability is high that they are nearly equal. 

1. The following pairs showed a preference 
for the letter listed first at significance level 
indicated: 


1% level: SK, SG, SP, ST, GK 
5% level: PG, NK, NG, SN, SV, TK 


2. The following pairs show no preference 
for either letter, with .50 + .05 as the toler- 
ated limits: 


Power 85%: GT and VP 
Power 75%: TV and VK 


3. There do not appear to be any consist- 
ent position preferences or sex differences, ex- 
cept that the letter K is significantly preferred 
by more women than men. 


Received July 9, 1958. 
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Psychologists have been interested in the 
problem of faking on personality and interest 
inventories for at least a quarter century. In 
the classical experimental design (Cross, 
1950; Gough, 1947; Kelly, Miles, & Terman, 
1936; Kimber, 1947; Longstaff, 1948; Noll, 
1951; Rabinowitz, 1954; Steinmetz, 1932) 
in which Ss are asked to fake a self-report in- 
strument in a certain way, they have usually 
obliged. Authors of these studies have pointed 
out, however, that their demonstration that 
self-report inventories can be faked does not 
necessarily mean that they are faked in a 
counseling situation, or even in a personnel 
selection situation for that matter. 

Primary interest in this area appears to be 
turning toward the study of faking in situa- 
tions involving real-life motivation. In a 
carefully designed study Heron (1956) con- 
cluded that emotional maladjustment scores 
are more favorable (implying more faking) 
when an inventory is administered under the 
conditions of employee selection than when 
the situation is perceived as participation in 
research. In the case of sociability scores no 
significant difference was found. Gordon and 
Stapleton (1956), using a forced-choice scale, 
found significantly higher mean scores on 
their emotional stability scale and responsi- 
bility scale (implying more faking) under 
realistic high school employment conditions 
than under routine vocational guidance con- 
ditions. No difference was found on the 
ascendency scale under the two conditions, 
while significantly higher scores on sociability 
were found under the guidance condition than 
under the employment condition. The authors 
concluded that individuals in their study did 
not change their profile patterns substantially 
from the guidance to the employment condi- 
tion and jhat increases in scores, while sta- 


1 The opinions expressed are those of the writers 
and are not necessarily shared by the Department 
of the Navy. 
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tistically significant in some instances, never- 
theless were moderate. 

Of further interest in this area is the ques- 
‘tion of whether or not individuals who already 
have validly high interest scores on a given 
occupation can (and do) increase their scores 
even further when it is to their interest to 
do so. ‘Tacitly assuming that some faking 
does occur in such situations, the present 
study investigated two hypotheses concerning 
conditions which affect faking on an interest 
inventory and a personality inventory in a 
military occupational classification situation. 
The two hypotheses were: 


1. Scores made on an inventory under con- 
ditions where Ss are aware that a “falsifica- 
tion score” will be computed will be Jess 
favorable than scores made under standard 
conditions of administration. (Implying less 
faking.) 

2. Scores made on an inventory under con- 
ditions where Ss are directed to fake, and re- 
inforced with appropriate incentives for doing 
so, will be more favorable than scores made 
under standard conditions of administration. 
(Implying more faking.) 


Procedure 


The U. S. Navy Vocational Interest Inventory de- 
veloped by Clark (1949, 1956) and the Minnesota 
Multiphasic Personality Inventory (MMPI) were 
administered to a total of 773 U. S. Navy aviation 
recruits under three conditions as follows: 


Condition 1. The Ss were informed that the in- 
formation provided by the inventories would be used 
in the counseling and assignment procedure to assist 
in deciding to which Naval Aviation occupation they 
would be assigned. Standard directions for the two 
tests were then read by the examiner with the Ss 
following in their copies of the directions. 

Condition 2. This condition was identical with the 
one just described except that the following state- 
ments were added: “In addition to the scores on 
interest and personality, the inventories yield a falsi- 
fication score called a lie score. That is, it is possible 
to measure how frankly or honestly each individual 
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has answered the inventory items. Anyone having 
a high falsification or lie score will be in a less 
favorable position to get the occupation he wants, 
than those who mark the inventories truthfully.” 
Condition 3. Before reading the standard directions 
to the Ss they were informed that it was planned 
to use one or both of the inventories in a nkw train- 
ing program with which the Ss were familiar. It 
was explained that information was needed as to how 
much the two inventories could be faked. An incen- 
tive, consisting of assignment to training for the 
occupation of their choice, was offered for the five 
Ss who made the highest interest score on their pre- 
ferred occupation and who presented themselves in 
the most favorable light from the standpoint of 
personality scores. It was stipulated, however, that 
they must meet minimum aptitude requirements for 
the desired training in order to get the assignment. 


Each of the three basic conditions was further 
divided into two subconditions. In one subcondition, 
the interest inventory was administered first and the 
persenality inventory second. Under the second sub- 
condition, the order of administration of the two 
inventories was reversed. In the treatment of the 
data by analysis of variance the effects due to order 
were removed. 

Prior to giving the directions for taking the inven- 
tories each of the groups was asked to fill out an 
Occupation Preference Sheet on which the Ss indi- 
cated their choice of the 12 Naval Aviation occupa- 
tions. This sheet also contained an item which 
inquired into their degree of preference for the 
occupation selected. On the basis of this informa- 
tion the basic experimental groups were further 
divided into two groups, those who greatly preferred 
a certain occupation and those who indicated a less 
decided preference for the occupation of their choice. 

Of the 773 men who took the inventories, 581 indi- 
cated that they would prefer one of the following 
aviation occupations: engine mechanic, structural 
mechanic, ordnanceman, electronics technician, elec- 
trician, or storekeeper. Interest inventory keys de- 
veloped specifically for these occupations or for 
similar nonaviation Navy occupations were used in 
scoring the interest inventory. These 581 students 
constitute the sample used in the study. Of this 
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number, 426 indicated that they “greatly preferred” 
the rating they indicated on the Occupation Prefer- 
ence Sheet and 155 indicated a less intense preference. 
For purposes of the study the 426 men who “greatly 
preferred” one of the ratings were considered as a 
more highly motivated group occupationally than 
the 155 who marked the other alternatives, and the 
scores made by the two groups were analyzed sepa- 
rately. The first group will be referred to as the 
“greatly prefer” group and the other group as the 
“non-greatly prefer” group. 

The interest scores for the six occupational keys 
were standard scores having a mean of 50 and a 
standard deviation of 10. The standardization group 
was a group of approximately 2000 Navy Airman 
Preparatory School students who had taken the inter- 
est inventory previously. The present data were 
collected in the same Airman Preparatory School. 
At the conclusion of this five weeks course the stu- 
dents were assigned to more specialized training 
which leads to one of the Naval Aviation occupa- 
tions. The Ss were members of three classes, one of 
which was in the second week of training, one in 
the third week, and one in the fourth week. Each 
class contained 12 instructional sections. The sec- 
tions, which were filled as the men reported for 
training, were assigned to the experimental conditions 
in a manner designed to avoid any systematic bias 
that might be associated with class membership. 


Results 


If both of the hypotheses under considera- 
tion were correct the lowest mean interest 
inventory score, and an elevated F score on 
the MMPI, should occur under Condition 2, 
that is, knowledge that a falsification score 
would be computed. The second lowest score 
on occupational interest should occur under 
Condition 1, standard directions, and the 
highest score on occupational interest, along 
with an elevated L score, should be observed 
under Condition 3, in which the Ss were 
directed to fake the inventory as much as 


Table 1 


Comparison of Interest Scores on Preferred Occupation of ‘Greatly Prefer” and 


“Non-Greatly Prefer” Group 


Condition 


Standard Directions 
Knowledge of Falsification Score 
Directions to Fake 


Mean SD 


61.00 
59.46 
61.28 


8.24 


* 05 level of confidence. 
** 01 level of confidence. 


“‘Non-Greatly Prefer” Groups Under Three Conditions 

“Greatly Prefer” Group ee 
N Mean SD N F 
146 64.54 8.50 52 7.55 6.97** 
145 63.76 8.53 50 9.52** 
; 135 64.49 9.30 53 9.29 ce 


Faking in a Vocational Classification Situation 


Table 2 


Comparisons of MMPI F and L Scores Made Under Different Conditions, Effects Due to 
Order of Administration of the Two Inventories Removed 


“Greatly Prefer’ Group 


_ “Non-Greatly Prefer’ Group 


Condition N Mean 


Standard Directions 146 2.90 


MMPI F Score 
Knowledge of Falsi- 
fication Score 


MMPI L Score Standard Directions 


Directions to Fake 135 5.42 


* .0S level of confidence. 
** 01 level of confidence. 


possible. The reverse would be true of scores 
on the clinical scales of the MMPI, as mal- 
adjustment was considered to be associated 
with high scores. 

In general, the results of comparisons in- 
volving interest inventory scores and MMPI 
clinical scale scores supported neither of the 
hypotheses. In the comparison of the three 
experimental groups on the basis of interest 
scores ior the occupation preferred by each 
individual, no significant differences were 
found for either the high preference group or 
for the low preference group. Means and 
sigmas are presented in Table 1 (the F test 
values shown in the table do not relate to 
this comparison, but instead to another com- 
parison that will be described in the next 
paragraph). Neither did the three basic 
groups differ significantly on any of the clini- 
cal scales of the MMPI.” But, as shown in 
the F test comparison in Table 2, MMPI F 
scale scores were significantly higher within 
the greatly prefer group under Condition 2, 
or knowledge that a falsification score would 
be computed, and ZL scale scores were signifi- 
cantly higher under Condition 3, instructions 
to fake, for both preference groups, when 
compared with scores made under the condi- 
tion of standard directions (scores for the 
third, nonhypothesized condition in both the 
F scale and L scale comparison did not differ 


2 A 2-page table giving data for each of the MMPI 
scales has been deposited with the American Docu- 
mentation Institute. Order Document No. 5834, re- 
mitting $1.25 for 35-mm. microfilm or $1.25 for 6 
by 8 in. photocopies. 


SD F N Mean SD F 


2.41 7.40** 52 4.34 6.21 NSS. 


50 3.98 


52 4.35 
2.88 53 5.45 2.12 


2.90 


significantly from the condition of standard 
directions). Unlike the general trend of the 
findings, these results are in accord with the 
hypotheses. 

An additional comparison aids in the inter- 
pretation of the results and is also of interest 
in its own right. As shown in the previously 
mentioned F test comparison in Table 1, men 
who indicated on the Occupation Preference 
Sheet that they greatly preferred the occupa- 
tion of their choice made higher mean scores 
on the interest inventory than did men who 
did not greatly prefer the occupation of their 
choice. This occurred despite the fact that 
both groups were well above the mean for 
Navy men in general. These results appar- 
ently reflect the validity of the interest inven- 
tory and are in accord with more direct evi- 
dence concerning the validity of the inventory 
(Clark: 1949, 1956; Clark & Gee, 1954; 
Mayo & Thomas, 1956). 


Discussion 


Summarizing the results briefly: first, no sig- 
nificant differences were found between inter- 
est inventory scores or between MMPI clinical 
scale scores that could be attributed to the 
experimental conditions. Second, such differ- 
ences were found in the case of the MMPI 
F scale and L scale. Third, interest inventory 
scores were significantly higher for men who 
indicated that they greatly preferred a given 
occupation as compared with scores made by 
men who expressed a milder preference for 
the occupation of their choice. 

An interpretation of the data that was con- 
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sidered was that all three experimental groups 
made the best score they could on both inven- 
tories, that is, faked them as much as possible 
and hence made essentially the same mean 
scores on the interest inventory and the clini- 
cal scales of the MMPI. But this interpreta- 
tion encounters difficulty in explaining why 
the L scores of the group that was instructed 
to fake were higher than those of the groups 
that were not instructed to fake. Neither is 
the finding easily handled that the men who 
did not greatly prefer their occupational 
choice had lower interest scores but essentially 
the same MMPI L scores as the men who 
greatly preferred the occupation of their 
choice. Furthermore, mean Z scores were 
moderately low, less than a raw score of 5, 
in the groups that were not instructed to fake. 

A more likely interpretation, and one which 
is believed to be compatible with all the 
findings, is that under the present design there 
was no marked, systematic tendency for the 
interest inventory and the clinical scales of 
the MMPT to be faked successfully under any 
of the conditions, although the evidence points 
to an attempt to fake them under Condition 
3. It is suggested that the generality of 
this interpretation be considered as tentative 
and subject to change in the light of further 
evidence collected in vocational classification 
situations. It should be made explicit that 
no claim is made that the two inventories 
were not faked to some extent. Further, at- 
tention is again drawn to a special condition 
in the design of the study, namely, that indi- 
viduals with an already validly high degree 
of interest in an occupation were being tested. 
This makes the demonstration of faking as 
indicated by even higher interest scores fairly 
difficult, but is, in the writers’ opinion, closer 
to a real-life situation than are the classical 
experimental designs concerning faking. It 
would appear that the design did provide an 
adequate opportunity for any large scale fak- 
ing to be detected, and the results failed to 
substantiate that this occurred. The results 
concerning the effect of providing information 
that a falsification score will be computed are 
interpreted to indicate that the procedure 
either is not needed with these two inven- 
tories or that it is not effective’ under the 
conditions of this study. 
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Summary 


Two hypotheses concerning faking on an 
interest inventory and a personality inventory 
were tested under the real-life motivation con- 
ditions of a military occupational classifica- 
tion situation. Generally, the results failed 
to support either the first hypothesis—that 
knowledge that a falsification score would be 
computed would result in less favorable mean 
inventory scores, thereby indicating less fak- 
ing—or the second hypothesis—that direc- 
tions to fake the inventories, accompanied by 
an appropriate incentive, would result in more 
favorable mean inventory scores, indicating 
more faking. 

Significantly higher MMPI “lie” ZL scores 
were made under directions to fake the 
inventories, and significantly higher MMPI 
“validity” F scores were found in one of 
the two groups under the condition involving 
knowledge that a falsification score would be 
computed. The men who said they greatly 
preferred a certain aviation occupation made 
higher mean interest inventory scores on the 
occupation they preferred than did men who 
expressed a less intense interest in their pre- 
ferred occupation. The data do not force 
the conclusion that faking on the two self- 
report inventories in a military vocational 
classification situation is minimal, but the 
experiment provided a favorable opportunity 
for faking to manifest itself and it was not 
observed to any appreciable extent. 


Received July 17, 1958. 
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USE OF TELEVISION FOR REMOTE CONTROL: 
A PRELIMINARY STUDY 


ROBERT L. MARTINDALE ano WILLIAM F. LOWE 
Air Force Special Weapons Center 


Future air weapon and space flight systems 
are expected to require a wide range of re- 
mote control and manipulation activities. In 
many of these cases direct sensory contact 
with the remote field of activity will not be 
possible. Closed circuit television has been 
suggested as a simple means to provide visual 
feedback under such conditions. Unfortu- 
nately, it has not been commonly realized 
that the use of television may systematically 
alter the visual field. Such alterations can 
produce movements in the visual field that 
conflict with the movements an operator 
might expect from his motor performance. 
An earlier study demonstrated the disruption 
of performance that may result when the 
visual field of direct motor behavior is sys- 
tematically altered through the use of tele- 
vision (Smith, Smith, Stanley, & Harley, 
1956). 

The present study is a preliminary experi- 
mental test of the use of television in a re- 
mote performance situation. This was a sim- 
ple cyclic task performed by means of a sim- 
ple extension of a motor end organ. Two 
general questions were posed. Does accuracy 
of performance vary when the visual field 
is systematically varied? Secondly, does the 
accuracy of performance improve when the 
visual orientation is normalized but the pro- 
prioceptive cues remain systematically al- 
tered? It is reasonable to assume that any 
deterioration in the accuracy of performance 
demonstrated with the simple cyclic task of 
the present experiment might be even more 

1 This experiment was carried out in the Human 
Factors Division Laboratories, Research Directorate, 
Air Force Special Weapons Center, Kirtland Air 
Force Base, Albuquerque, New Mexico under Air 
Force Project 1811. Edward S. Halas assisted in 
running the Ss. 

Permission is granted for reproduction, translation, 
publication, use and disposal in whole and in part 
by or for the United States Government. The opin- 
ions expressed in this paper are those of the authors 
and do not necessarily reflect the views or have the 


endorsement of the U. S. Air Force or U. S. Govern- 
ment. 
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pronounced with the irregular tasks that 
would be more typical of a practical remote 
control situation. 


- Method 

Fifteen male right-handed Air Force officers were 
utilized as Ss. The task required S to follow a pur- 
suit rotor target with a stylus while viewing the 
rotor turntable and stylus tip in a 17 in. black and 
white television monitor screen. Each S was seated 
in a chair before ‘a 6 in. turntable which revolved 
in a counterclockwise direction at 1 rpm. The 
surface of the turntable was positioned in a hori- 
zontal plane 23 in. from the floor. The target was 
; in. diameter and revolved at a constant radius of 
1 in. around the turntable center. The stylus flexed 
in the vertical plane, but was rigid in the horizontal. 
Both turntable and stylus tip were obscured from 
S’s direct vision. 

A television camera pointed down toward the turn- 
table at 45° angle which approximated what would 
have been S’s normal line of sight to the unobscured 
turntable. The visual field was displayed in the 
horizontal plane by positioning the camera as illus- 
trated in Fig. 1. The angular displacements pro- 
duced of the visual field around the turntable from 
S’s normal unobscured line of sight were 30° (C) 
and 90° (B) to S’s right and 90° (D) and 175° (A) 


MONITOR 
(A),(8),(C),(0) 


CAMERA 


MONITOR 
(E) 


CAMERA 
(c) 


\ 


SUBJECT'S NORMAL 
UNOBSCURED LINE 
OF SIGHT 


Fic. 1. Top view diagram of S, turntable, camera, 


and monitor positions. 
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Use of Television for Remote Control 


Table 1 


Analysis of Variance of the Time on Target Scores 


Source of Variance 


df 


Sum of 
Squares 


Square 


Total 
Order of conditions 
Residual between individuals 
Total between individuals 
Experimental conditions 
Trials 
Residual from latin square 
Residual within latin square 
Pooled error 
Total within individuals 


111,239 

20,002 5,000 5.03* 
9,951 995 

29,953 

60,408 15,102 38.82** 

668 167 

7,166 597 1.83 

13,044 326 

20,210 389 


81,286 


* Significant at 5% level of confidence. 
** Significant at 1% level of confidence. 


to S’s left. The monitor screen was positioned with 
its center 175° around the turntable to the right of 
S’s normal line of sight for each of the above camera 
positions, thereby providing four experimentation 
conditions. An additional condition (E) was pro- 
vided by positioning the camera as in condition (D), 
but with the monitor positioned 90° around the 
turntable to the right of S’s normal line of sight. 
Both monitor positions presented an image of identi- 
cal size, but slightly larger than would have been 
expected with unobscured vision of the turntable. 
The center of the turntable image on the monitor 
was 25 in. from the floor and approximately 80 in. 
from S’s eyes under both monitor conditions. 

One 3-min. trial was given to each of the 15 Ss 
under each of the five experimental conditions. The 
intertrial interval was approximately 4 min. Time 
on target during each trial was electrically recorded. 
These time data were analyzed in a 5X5 latin 
square with 3 replications. Trials constituted col- 
umns of the square, rows were orders, and cells were 
conditions. 

Each S was carefully instructed as to his required 
performance before each experimental session and E 
provided a brief demonstration. Before each trial, S 
positioned the stylus on the stationary target through 
use of the monitor image. Otherwise, no practice 
was provided. The turntable and timer were started 
when S indicated he was on target and ready. In 
the case of Condition (E), S was instructed to main- 
tain normal body relation to the task but to turn 
his head to the right to view the monitor screen. 


Results 


Table 1 presents an analysis of variance of 
the time scores. These data meet the require- 
ments for homogeneity of variance and for 
homogeneity between latin square replica- 
tions. Therefore, significant F values are at- 


tributed to differences between means and 
the combined analysis is considered justified. 
Differences between experimental conditions 
and between orders of presentation of the 
conditions exhibited the only significant F 
value. The trials (columns in the latin 
square) did not exhibit a significant F value. 

Table 2 lists the means for each experi- 
mental condition. A mean difference of 19.3 
is required for significance at the 1% and 
14.5 at the 5% level of confidence based on 
the multiple ¢ test or “least significant differ- 
ence.” The estimate of error variance is the 
“pooled error” mean square of 389, each mean 
is based on 15 observations, and ¢ is the 1% 
two-tailed point from Student’s ¢ tables with 
52 df. Therefore, the mean of Condition A 
differed from B, D, and E; C from D, B, and 
E; and E from D at less than the 1% level. 
Condition E differed from B at the 5% level. 

Table 3 lists the orders of presentation of 


Table 2 
Mean Time on Target per Trial for 
Experimental Conditions 


Experimental Mean Time 


Condition (sec.) 
A 117.1 
B 53.9 
102.7 
D 42.9 
E 69.4 
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Table 3 


Mean Time on Target per Experimental, Session 
for Orders of Presentation of Experi- 
mental Conditions 


Mean Time 


Order of Conditions (sec. ) 


I (A) (B) (C) (D) (E) 
(B) (C) (D) (E) (A) 
IIE (C) (D) (E) (A) (B) 
IV (D) (E) (A) (B) (C) 
(E) (A) (B) (C) (D) 


316.7 
459.0 
270.0 
400.3 
483.7 


conditions and the respective mean for each 
order. A difference of 81.6 was required at 
the 1% and 57.4 at the 5%. level of confi- 
dence based on the multiple ¢ test. The esti- 
mate of error variance is the “residual be- 
tween individuals” mean square of 995, each 
mean is based on 3 observations, and ¢ is the 
applicable two-tailed point from Student’s ¢ 
tables with 10 df. Therefore, the mean of 
Order III differed from II, IV, and V; I from 
II, IV, and V; and V from V at less than the 
1% level. Order IV differed from II at less 
than the 5% level. 


Summary and Conclusions 


Different systematically displaced televised 
performance fields produced marked differ- 
ences in performance accuracy on a simple 
cyclic motor task. Average time on target 
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ranged from about 4% of the trial for the best 
condition to less than } for the poorest con- 
dition. The best performance occurred with 
the visual field displaced 175°, i.e., with an 
approximately inverted image, and the worst 
performances with 90° displacements. A 
marked improvement was noted when the 
90° displaced field was normalized by repo- 
sitioning the monitor screen, but the proprio- 
ceptive cues remained 90° out of phase in 
Condition (E). The significant differences 
between orders of experimental conditions 
undoubtedly indicate a practice effect. How- 
ever, the present design does not lend itself 
to an analysis of these differences which may 
be due in part to an artifact introduced by 
the systematic diagonal Latin square that was 
employed. 

The usefulness of closed circuit television 
as a means to provide visual feedback for a 
remote performance field appears to be seri- 
ously limited when the visual field is dis- 
placed. This limitation can be partially over- 
come by repositioning the monitor screen in 
the operator’s visual field in such a manner 
as to compensate for the camera displacement. 


Received July 16, 1958. 
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OPTIMAL INTERVAL LENGTH FOR VISUAL 


INTERPOLATION: 


THE EFFECT OF VIEWING DISTANCE! 


A. V. CHURCHILL 


Defence Research Medical Laboratories, Toronto, Canada 


In situations involving visual displays it is 
generally assumed that the “law” of the visual 
angle is applicable, i.e., that an increase or 
decrease in viewing distance must be accom- 
panied by a proportional increase or decrease 
in the dimensions of the display and thus 
maintain a constant visual angle. This as- 
sumption has led to the recommendation that 
in research on visual displays the standard to 
be employed when specifying the stimulus 
variable of size is “visual angle in degrees— 
or actual dimensions—(provided distance is 
also given)” (U. S. Armed Forces, 1950). 

A recent study of visual interpolation * 
(Churchill, 1956) disclosed a trend towards 
an optimal interval length when interpolating 
to tenths of a scale interval from a viewing 
distance of 28 inches. Subsequent trials 
from viewing distances of 56 inches and 84 
inches, with the same displays, yielded re- 
sults which suggested that the optimal length 
of interval was independent of viewing dis- 
tance, tending to be constant over the dis- 
tances tested. 

The experiments reported here were under- 
taken to determine (a) the optimal length of 
interval for interpolating in tenths, and (d) 
the effect of viewing distance on the optimal 
interval length. 


Experiment 1 
Method 


Apparatus. The apparatus has been described else- 
where (Churchill, 1956). Seven horizontal scale in- 
tervals (0.25, 0.5, 0.75, 1.0, 1.5, 2.0, and 3.0 inches 
long) were used. A “0” appeared above the scale 
mark at the left extremity of the interval, and a 
“10” above that to the right. The separation be- 


1 Defence Research Medical Laboratories Report 
No. 164-8, PCC No. D77-94-20-27, HR No. 158. 

2 By visual interpolation is meant the estimation 
of the position of the pointer at unit positions, i.e., 
1-2-3...9, between the two boundaries designating 
the extreme values of the scale interval, i.e., “0” and 
“10.” Errors of interpolation are defined as those 
estimations which are incorrect. 


tween the pointer tip and the horizontal scale, line 
was 0.125 in. in the plane of the scale. Viewing dis- 
tances were 28, 56, and 84 in. Display brightness 
was 120 footlamberts for all conditions. 

Procedure. Twenty-four laboratory personnel 
served as Ss. The six orders of presentation of the 
three viewing distances were each used four times 
and assigned to the Ss randomly. Scale intervals 
were presented in random order at each viewing dis- 
tance. Eighteen settings, two at each pointer posi- 
tion from 1 to 9 in random order, constituted a trial 
for each interval. Exposure time was 0.5 sec., with 
an interexposure period of 4 sec. for S’s response. 
The procedure was repeated, in a different random 
order, before changing the viewing distance. Since 
interpolating from the shorter scale intervals at the 
longer viewing distances presented a difficult visual 
problem, the 0.25-in. interval was not presented at 
56 in. nor were the 0.25-in. and 0.5-in. intervals 
presented at 84 in. 

Ss were instructed in the task and shown sample 
scales before beginning the trials. Interpolations 
were reported to the nearest unit. A “Ready” signal 
preceded each trial. 


Results 


The data are presented graphically in Fig. 1. 
Part A of Fig. 1 shows the effect of inter- 
val length on interpolation accuracy, with an 
optimal interval of 1.0 in. at the 28-in. view- 
ing distance. The effect of viewing distance 
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Fic. 1. Scale interpolation errors as a function of 
(A) interval length, and (B) visual angle, at three 
viewing distances. 
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is shown by an optimal interval of 1.5 in. at 
the 56-in. viewing distance and 1.0-1.5 in. at 
the 84-in. viewing distance. Part B of Fig. 1 
presents the same data in terms of visual 
angle. 

It is apparent from these data that a scale 
interval length of 1.0-1.5 in. generates a 
minimum number of errors of interpolation, 
regardless of viewing distance. The fact that 
the three curves in Part B of Fig. 1 do not 
constitute a single curve is interpreted to 
mean that the “law” of the visual angle is not 
applicable to the displays and conditions un- 
der consideration here. 


Experiment 2 


In Experiment 1, the dimensions of the 
component parts of the intervals—line thick- 
ness, pointer dimensions and numeral size— 
were constant, interval length being the only 
factor varied. Consequently, the component 
parts (pointer, digits, etc.) subtended smaller 
visual angles at greater viewing distances. In 
Experiment 2 the dimensions of these parts 
were kept proportional to variations in in- 
terval length, i.e., the dimensions subtended 
the same visual angles at the different view- 
ing distances. 


Method 


Apparatus. A horizontal scale interval, with “O” 
at the left and “10” at the right, was photographed 
with the pointer in turn at each of the 11 positions 
from “0” to “10” and sets of black on white slides 
were made from the photographs. Approximate di- 
mensions of the component parts of a projected 1-in. 
interval were: horizontal scale line, 1.0 X .03 in. 
wide; vertical scale marks at the extremities, .20 X 
.03 in. wide; numerals “0” and “10,” .10 in. high; 
pointer, .56 X .08 with a tip of .03 in. wide; sepa- 
ration between pointer tip and horizontal scale line, 
.08 in. in the plane of the scale. The projection ap- 
paratus was such as to permit variations in interval 
length to be accompanied by proportional variations 
in the dimensions of all component parts. Display 
brightness was 10 foot-lamberts for all conditions. 

Procedure. Nine combinations of interval length 
and viewing distance (0.5-in. and 1.5-in. intervals at 
28 in.; 0.5, 1.0, 1.5, and 3.0 in. at 56 in.; 0.5, 1.5, 
and 4.5 in. at 84 in.) were presented in random or- 
der to each of the five laboratory personnel who 
served as Ss. Following a five-minutes rest the pres- 
entations were repeated in a different random order. 
The intervals were selected to permit a comparison 
of two interval lengths (0.5 and 1.5 in.), and two 
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visual angles (approximately 1 and 3 degrees) at the 
three viewing distances. 

A trial consisted of 20 exposures. The first two at 
pointer positions “0” and “10” respectively oriented 
S while the remaining 18 were a randomized series 
of pointer positions 1 to 9, twice each. Exposure 
time was .25 sec., with an interexposure period of 
4 sec. for S’s response. The procedure was repeated 
one week later. 


Results 


Data were tabulated in percentage of inter- 
polations in error, and transformed to degrees 
= sin“' \/p) to satisfy the assumptions of 
analysis of variance (Quenouille, 1950). Re- 
sults of the analysis of the nine display con- 
ditions, for the two days, are presented in 
Table 1. 

Table 1 shows an error term of the same 
order of magnitude as the theoretical residual, 
indicating that the performance of Ss is con- 
sistent from trial to trial within the same day. 
The significant S x D interaction shows that 
the over-all level of performance changes from 
day to day in a manner dependent on indi- 
vidual Ss. The absence of inflated interac- 
tions involving conditions, indicates the re- 
producibility of performance under these con- 
ditions across Ss and Days. 

The means of percentages of interpolations 
in error (retransformed from degrees), for 
the nine conditions, are shown in Table 2. 

Table 2 shows that with the smaller visual 
angle (1 degree), errors decrease as viewing 
distance is increased. With the larger visual 


Table 1 


Analysis of Variance of Nine Combinations of 
Interval Length and Viewing Distance 


Mean 


Source df Square 


S (Subjects) 839.5* 
C (Conditions) 1782.4* 
D (Days) 9.7 
cxD 38.4 
SXC 52.8 
SXD 287.2* 
SxXCxXD 53.0 
Error 37.3 


Total 179 


*p <.01. Theoretical Residual 45.6. 
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Table 2 


Means of Percentages of Interpolations in Error 
(Based on 18 Observations X 2 Trials X 2 Days 
X 5 Subjects) 


Viewing Scale Interval Length (inches) 

Distance 

(inches) 0.5 1.0 Be 3.0 4.5 
28 25.7 6.7 
56 35.7 148 85 19.0 —_ 
84 46.2 9.4 23.7 


angle (3 degrees), errors decrease as viewing 
distance is decreased. It is to be noted that 
in both comparisons the 1.5-in. interval is 
optimum, although it subtends the smaller 
angle (1 degree at 84 in.) in one instance, 
and the larger angle (3 degrees at 28 in.) in 
the other. It is also apparent that the 1.5- 
in. interval is optimum at all three viewing 
distances. 

The data from Table 2 are presented 
graphically in Fig. 2. 

Parts A and B of Fig. 2 show the plots for 
comparable interval lengths and visual an- 
gles, respectively, at the three viewing dis- 
tances. The ditierence in the slope of the 
lines representing interval length in Fig. 2 
(A) reflects the fact that interpolation to 
tenths of the 0.5-in. interval becomes more 
difficult as viewing distance is increased. The 


VIEWING DISTANCE 
“26 INS 
56 INS 
INS. 


T 


T 


PERCENTAGE OF INTERPOLATIONS IN ERROR 


os ro 
SCALE INTERVAL LENGTH VISUAL ANGLE (OEGREES) 
(INCHES) 
Fic. 2. Scale interpolation errors as a function of 
(A) interval length, and (B) visual angle, at three 
viewing distances. 
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difference in the slope of the lines represent- 
ing visual angle in Fig. 2 (B) indicates the 
absence of angular constancy. 


Discussion 


The recommendation (U.S. Armed Forces, 
1950), based on the assumption that the 
“law” of the visual angle applies to situa- 
tions involving visual displays, implies that 
the specification of display size in terms of 
visual angle is synonymous with the specifi- 
cation of display size in actual dimensions 
and viewing distance. 

The results of the present study suggest 
that these two modes of specifying the stimu- 
lus variable of size are not synonymous. 
“Actual dimensions” appears to be a more 
crucial factor than “visual angle.” 

In the classical size-constancy experiment 
the observer is required to “size-match” 
standard and comparison stimuli which are 
presented simultaneously at different dis- 
tances from the observer. The task involved 
in the present study—visual interpolation— 
does not require “size-matching” and is thus 
an unusual approach to the size-constancy 
problem. 

It is evident from the results reported here 
that the kind of constancy demonstrated by 
the classical size-constancy experiment is not 
confined to situations involving the “size- 
matching” procedure. The fact that viewing 
distance has no effect on the optimal interval 
length signifies the existence of a constancy 
effect where size, per se, is not the dimension 
being judged. 


Summary and Conclusions 


Two experiments were conducted to estab- 
lish the optimal length of interval for visual 
interpolation in tenths and to determine the 
effect of viewing distance on the optimal in- 
terval length. Results of Experiment ! show 
that an interval length of 1.0—1.5 in. generates 
a minimum number of errors of interpolation 
at the three viewing distances (Fig. 1). Re- 
sults of Experiment 2 show an optimal inter- 
val length of 1.5 in. which is not affected by 
changing the viewing distance from 28 to 56 
or 84 inches (Fig. 2). 
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From these results it is concluded that the 
“law” of the visual angle does not apply un- 
der the conditions tested. It is suggested that 
display dimensions and viewing distance be 
stated when specifying display size, rather 
than combining these dimensions and speci- 
fying display size in terms of visual angle. 


Received July 18, 1958. 
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The Wonderlic Personnel Test, a short 
group test of mental ability designed espe- 
cially for industrial testing use, is available 
in five alternate forms, A, B, D, E and F, 
each of which includes 50 test items and in- 
volves 12 minutes of testing time. Forms 
D, E and F were developed by utilizing test 
items from the Otis Self-Administering Test 
of Mental Ability—Higher Form, while forms 
A and B were developed independently at a 
later time by Wonderlic. 

Wonderlic (1945) refers to the five forms 
of the Personnel Test as being equal and 
similar. Accordingly, the published norms 
are not differentiated by form, but are con- 
sidered applicable to any form. However, if 
it were found that any of the forms are sig- 
nificantly easier or more difficult than the 
others, using the same norms would affect 
the selection process. Hay (1952) reported 
a study in which 400 young women appli- 
cants for clerical positions in a large organi- 
zation were given both Forms D and F of 
the Personnel Test. He found that Form 
F was significantly easier than Form D at 
the 1% level of confidence. Weaver and 
Boneau (1956) reported a study concerning 
the comparability of all five forms of the 
Personnel Test in an academic testing situa- 
tion involving 70 Ss. Of the 10 differences 
between the means of pairs of test forms, 
nine significant differences were reported: two 
at the 2% level, three at the 1% level, and 
four at the .1% level. Only forms D and E 
did not differ significantly from each other. 

The present study was initiated in order 
to investigate the comparability of all five 
forms of the Personnel Test in an industrial 
testing situation, the kind of situation for 
which the test was especially designed. 


1 Formerly at Wayne State University. 


COMPARABILITY OF WONDERLIC TEST FORMS IN 
INDUSTRIAL TESTING 


LEONARD J. KAZMIER 
Ohio State University 


anp C. G. BROWNE 
Wayne State University 


129 


Sample and Procedure 


The Ss were 590 male applicants for an industrial 
apprenticeship program involving such trades as tool 
making, die making and plumbing pipe fitting in a 
large manufacturing company. The formal educa- 
tion of the applicants ranged from completion of the 
eighth grade to completion of college, with a mean 
of 11.77 years. Their ages ranged from 17 to 38 
years, with a mean of 21.81 years. 

The Ss were tested in 16 sessions, so that about 
27 Ss were tested in each session. The seating of 
the Ss in the examination room was by their own 
choice. All five forms of the Personnel Test were 
administered in all sessions in such a way that every 
fifth man took Form A, every fifth man took Form 
B, and so forth for Forms D, E and F. Thus, 118 
Ss took each form. It is assumed that the sys- 
tematic distribution of test forms resulted in a high 
degree of randomization of the Ss in terms of the 
abilities being measured. In addition, this sys- 
tematic manner of distribution avoided having the 
same test form administered to those sitting next 
to each other. Before the test began, the instruc- 
tions given on the first page of the test were read 
aloud and time was allowed to answer the sample 
questions on that page. The usual 12-minute time 
limit was used. 

Wonderlic (1945) has suggested that certain num- 
bers of points be added to the scores of those Ss 
who are 30 years of age and over, the number of 
points varying by age group. Twenty-nine of the 
590 Ss in this experiment were between 30 and 38 
years of age, so three points were added to each of 
their scores, as suggested in the test manual. How- 
ever, the mean score for each of the five forms was 
computed, both before and after the corrections for 
age were made. The over-all significance of the 
observed differences among both mean uncorrected 
and corrected scores was measured through calcula- 
tions of F ratios. The Duncan Multiple Range Test 
(Duncan, 1955) was then used to test all differences, 
taken two at a time, at the 5% level of significance. 

In his table of norms, Wonderlic (1945) also has 
indicated that there is a positive relationship be- 
tween years of education and test scores achieved. 
Since it was assumed that a high degree of randomi- 
zation of the Ss was achieved through the system- 
atic distribution of test forms, only chance differ- 
ences in educational level should exist among the 
Ss taking the five forms. Therefore, the mean years 
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Table 1 
Mean Scores and Variances by Personnel Test Form 
(N = 118) 


A B 


20.62 
51.55 
20.77 
55.54 


16.79 
39.64 
16.87 
42.46 


Corrected Mean* 
Corrected S** 


* Scores corrected for age. 


of education by test form were computed, and the 
significance of differences among those taking the 
five forms was determined by calculating the F ratio. 

Finally, the numerical differences among the un- 
corrected mean scores of the five forms found in the 
present study were compared with the differences 
reported by Hay (1952) and by Weaver and Boneau 
(1956). The purpose of this was to observe if the 
direction of the interform differences tended to be 
the same, that is, if the same forms were consistently 
higher or lower in comparison with the other forms. 


Results 


Wonderlic (1945) suggests that Forms A 
and B or Forms D and F be paired when two 
alternate forms of the Personnel Test are 
used. In the present study, neither of these 
pairs of forms were found to be mutually 
comparable when the scores were corrected 
for age. The mean uncorrected scores for the 
five forms ranged from 16.79 for Form B to 
21.13 for Form F, a total range of 4.34 score 
points, while the mean corrected scores ranged 
from 16.87 for Form B to 21.38 for Form F, 
a total range of 4.51 score points. The means 
and variances for all of the forms, both un- 


corrected and corrected for age, are presented 
in Table 1. The analyses of variance, in- 
cluded in Table 2, indicate that the observed 
differences among the means are significant 
at the 1% level of confidence whether or not 
the scores are corrected for those 30 years 
of age or older. 

The 10 possible differences between the 
means of both uncorrected and corrected 
scores are presented in Table 3. The number of 
score point differences between pairs of mean 
uncorrected scores ranged from .19 between 
Forms D and E to 4.34 between Forms B 
and F, while the differences between pairs of 
mean corrected scores ranged from .29 be- 
tween Form D and E to 4.51 between Forms 
B and F. In almost every case, correcting 
the scores for those 30 years of age or over 
had the effect of increasing obtained differ- 
ences between the means of pairs of test 
forms. In only one case was the difference 
reduced, that being the difference between 
the mean scores of Forms A and E, which was 
reduced from .96 to .93 score points by apply- 
ing the correction. On the basis of the Dun- 
can Multiple Range Test, the mean score for 
Form B differed from all of the other forms 


of the Personnel Test at the 1% level of sig- 
nificance whether or not corrections are made | 
for age factors. In addition, Forms D and F 
differed from each other at the 5% level of 
significance when corrections for age were 


made. It had been anticipated that since 
Forms A and B are the most recently con- 
structed forms of the test, there would be less 
difference between these two forms than be- 


Table 2 


Analyses of Variance for Difference Among Scores on Personnel Test Forms 


Sum of 


Source of Variance Squares 


Mean 
Square 


Between Forms 
Within Forms 


1,355.8 
23,754.5 
Total 25,110.3 


Between Forms Corrected* 
Within Forms Corrected* 


Total 


339.0 


® Scores corrected for age. 
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ds Mean 19.47 19.66 21.13 
34.28 45.29 35.68 
19.55 19.84 21.38 
| 37.51 48.50 45.82 
| 
“| 
|| 
| 
df F P 
4 8.35 <.01 
585 40.6 ‘ 
589 
Po 1,418.8 4 354.7 8.80 <.01 : 
23,589.7 585 40.3 4 


Table 3 
Differences Between Mean Scores on Personnel 
Test Forms 
(N = 118)* 
Form A B D E F 


1.15 .96 | 
1.22 93 61 


D x 19 1.66 
29 1.83* 
I xX 1.47 
1.54 
F x 
* Italics indicate differences between means of scores cor- 
rected for age. 
* Significant at the 5% level on the basis of the Duncan 


Multiple Range Test. 
** Significant at the 1% level on the basis of the Duncan 
Multiple Range Test. 


tween other pairs of test forms. As indicated 
in Table 3, this was not the case. The differ- 
ence between the uncorrected mean scores for 
Forms A and B was 3.83 score points, while 
the corrected scores differed by 3.90 score 
points. These differences were exceeded only 
by the differences between Forms B and F. 
That these differences among test forms 
cannot be ascribed to a chance variation 
among the Ss in regard to their educational 
level was demonstrated by testing the signifi- 
cance of observed differences in educational 
level for those taking the different forms of 
the test. The mean years of education by 
test form ranged from 11.74 for Form A to 
11.80 for Form D, a total of just .06 years. 
The variances ranged from .80 for Form F 
to 1.85 for Form A. The mean number of 
years of education and the variances involved 


Table 4 


Mean Years of Education and Variances by 
Personnel Test Form 


Comparability of Wonderlic Test Forms 
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Table 5 


Analysis of Variance Table for Differences in Years 
of Education by Personnel Test Form 


Sum of Mean 
Squares df Square F i 


Source of 
Variance 


5.7 4 
681.0 585 


686.7 


1.42 
1.17 


Between Forms 
Within Forms 


Total 


1.21 >.05 


589 


for all of the forms are summarized in Table 
4. The F ratio for the observed differences 
is not significant at the 5% level being tested, 
as indicated in Table 5.” 

The mean scores found in the present study 
are considerably lower than those reported by 
Hay (1952) and by Weaver and Boneau 
(1956), as indicated in Fig. 1. This, of 
course, is a function of the different types of 
samples involved in each study. However, 
the order and magnitude of differences among 


Mean Scores 
N 


—Hay 


----— Weaver and 
Bone au 


Present Study 


Wonder lic Form 


Fic. 1. Comparison of the mean scores obtained 
by Hay and by Weaver and Boneau with those ob- 
tained in the present study. 


(N = 118) 
Form A B D E F 
Mean 11.74 11.77 11.80 11.76 11.76 
Ss 1.85 1.19 1.17 82 .80 


2 Hartley’s Test (David, 1952) employed at the 
5% level, indicates that heterogeneity of variance 
exists. However, since the effect of the heteroge- 
neity is to inflate the F that is obtained, no correc- 
tion is necessary in the present situation involving 
a nonsignificant F. 
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test forms that were found in these studies 
are consistent. In the studies involving all 
five forms of the Personnel Test, Form B had 
the lowest mean score, Forms D and E dif- 
fered only slightly from each other, and Form 
F had the highest mean score. A discrepancy 
exists, however, in the relative position of the 
Form A mean score in the two studies. The 
mean score for Form A was relatively higher 
in the present study than it was in the study 
by Weaver and Boneau (1956). On the 
basis of their study in an academic situation, 
Weaver and Boneau suggested that the forms 
fall roughly into two groups, Forms A and B 
comprising a group of greater difficulty and 
higher variability than Forms D, E and F. 
This hypothesis is not supported by the re- 
sults of the present study, which was con- 
ducted in an industrial testing situation. 


Summary and Conclusions 


Sixteen groups consisting of 590 male ap- 
plicants for apprenticeship programs in a 
large manufacturing company were tested 
using all five forms of the Wonderlic Person- 
nel Test (Forms A, B, D, E and F). Every 
fifth man took Form A, every fifth man took 


Form B, and so forth for Forms D, E, and F, 
so that 118 Ss took each form. For those Ss 
30 years of age or over, obtained scores were 
corrected by adding score points as suggested 


in the test manual. The mean scores by 
Wonderlic form were computed, and the sig- 
nificance of obtained differences among the 
forms was tested by calculating F ratios and 
by using the Duncan Multiple Range Test to 
examine all possible differences between pairs 
of means for both uncorrected and corrected 
scores. In order to ascertain that obtained 
differences were not due to a chance distribu- 
tion of Ss in regard to educational level, the 
significance of difference in years of education 
by test form was also tested by calculating 
the F ratio. 

It was found that Form B of the Personnel 
Test was more difficult than any of the other 
forms at the 1% level of significanée; whether 
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or not score corrections for age are made, and 
that Forms D and F differed from each other 
at the 5% level of significance when such cor- 
rections are made. These differences could 
not be ascribed to differences in the educa- 
tional level by test form, which were not sig- 
nificant at the 5% level tested. The direction 
and magnitude of differences among the forms 
were found to be similar to differences re- 
ported in previous studies of the Personnel 
Test. 

On the basis of this study, it is recom- 
mended that Form B of the Personnel Test 
not be regarded as directly equivalent to any 
of the other four forms of the test and that 
Form D not be regarded as directly equivalent 
to Form F in industrial testing situations 
similar to the one in the present study. These 
findings are particularly pertinent, since Won- 
derlic suggests that when two forms of the 
test are to be used, the best combinations are 
A and B or D and F. Neither of these forms 
were found to be mutually comparable when 
the suggested scoring procedure was followed. 


Received July 21, 1958. 
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Gebhart and Hoyt (1958) have presented 
data showing that several scales of the Ed- 
wards Personal Preference Schedule (EPPS) 
discriminate between over- and underachievers 
in two Schools at Kansas State College. 
These results were of considerable interest to 
the writer, since in an unpublished study of 
apparently identical design, he had found no 
significant differences. In an effort to under- 
stand the discrepant findings, one difference 
in procedure became apparent. At Carnegie 
Tech, optimal prediction of academic per- 
formance in the College of Engineering and 
Science is obtained by an equation which 
employs three achievement tests* and high 
school standing as predictors. The equation 
produces stable cross-validity coefficients in 
the .65 to .70 range. The best estimate of 
performance, and hence the definition of a 
baseline for over- or underachievement is thus 
based on measures of past performance. In 
the Gebhart-Hoyt study, aptitude measures * 
were employed to define expected perform- 
ance. If it could be shown that this difference 
in procedure was responsible for the discrep- 
ancy in results, it would seem to offer addi- 
tional support for the EPPS as a valid device 
for the description of differential achievement, 
since it would indicate that the scales were 
reflecting personality characteristics associ- 
ated with a past as well as a future record 
of over- or underachievement. 

The present study had two objectives. 
First, it was designed to replicate the Gebhart- 
Hoyt study in regard to engineers and, as an 
extension, to test the difference between 
aptitude-based and performance-based deter- 
minations of expected performance. 


1 The clerical and computational assistance of Bar- 
bara Woods is gratefully acknowledged. 

2 College Entrance Examination Board tests in 
English composition, advanced math, and physics 
or chemistry. 

8 Pre-Engineering Ability Test for the School of 
Engineering and the ACE for the School of Arts 
and Sciences. 
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Procedure 


The samples were drawn from the population of 
411 freshmen who entered the College of Engineer- 
ing and Science in September, 1956. Two predic- 
tions of grade average were made for each student. 
The performance-based prediction employed the three 
achievement tests and high school standings; the 
aptitude-based prediction used the verbal and math 
scores from the College Board Scholastic Aptitude 
Test. Treating each set of predictions separately, 
students were categorized as low, average, or high 
predicted. A student was assigned to the over- 
achievement group if his first-year grade average was 
above the predicted score, to the underachievement 
group if the average was below that predicted. A 
student thus might be an overachiever on one basis 
and an underachiever on the other. 

The performance-based sample was constructed by 
selecting from each of six groups (two levels of 
achievement and three levels of predicted achieve- 
ment), the 20 Ss whose obtained average was most 
discrepant from that predicted. The aptitude-based 


Table 1 
Mean First Year Grades (Predicted and Achieved) 


for the Two Samples 
(N = 20 per cell) 


Group Average High 
Aptitude: 

Over Predicted 1.66 1.97 2.56 

Achieved 2.53 2.99 3.50 

Under Predicted 1.68 2.04 2.46 

Achieved 1.02 1.13 1.49 
Performance: 

Over Predicted 1.46 1.98 2.65 

Achieved 2.40 2.80 3.36 

Under Predicted 1.57 2.00 2.68 

Achieved 1.03 1.27 1.81 


4 Predicted averages of 1.80 and 2.20 were dividing 
points. The grading system assigns 4 points for an 
A, 3 for 2 B, 2 for a C, 1 for a D, and O for an R. 
The mean grade average for the freshman class is 
2.00; the prediction equations give an identical mean 
for the distribution of predicted grades. 
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Table 2 


EPPS Scores for Groups of Over- and Underachievers 


Aptitude-base 


Performance-base 


Achievement 
Deference 
Order 
Exhibition 
Autonomy 
Succorance 
Affiliation 
Intraception 
Dominance 
Abasement 
Nurturance 
Change 
Endurance 
Heterosexuality 
Aggression 13.2 


bp <.01. ep <.001. 

sample was formed by an identical procedure, using 
the aptitude-based prediction to define the groups. 
Table 1 presents the average predicted grade and 
the average achieved grade for each subgroup. 
Seventy-three Ss were common to the two samples, 


a fact which reflects the positive correlation between 
the aptitude and achievement tests used as predictors. 

Scores for the 15 EPPS scales were collected for 
all Ss. The inventory had been completed during 
the freshman orientation program. An analysis of 


Table 3 


EPPS Scores for Groups at Three Levels of Predicted Ability: Aptitude-based Sample 


Scale 


Achievement 
Deference 
Order 
Exhibition 
Autonomy 
Succorance 
Affiliation 
Intraception 
Dominance 
Abasement 
Nurturance 
Change 
Endurance 
Heterosexuality 
Aggression 


Average 


15.8 
15.8 
14.4 
12.1 


<.05. < .01, 


—~—F <1.0. 


134 

4 Over Under Over Under : 

| en (N = 60) (N = 60) (N = 60) (N = 60) ‘ 
Scale M M o F M M F 
4.1 13.098 1715 35 161 46 3.938 

43 11.2 3.1 12.6 4.5 3.85 

4.8 5.64" 12.7 4.0 11.9 5.0 

3.4 14.0 3.0 14.2 3.6 

4.1 14.6 4.1 15.1 4.1 

4.2 9.7 4.5 10.0 48 

43 9.31» 13.3 3.9 143 43 1.64 

5.9 2.13 15.3 5.1 149 6.0 

4.5 16.4 4.2 15.1 5.1 2.42 

4.8 1.80 13.3 4.2 129 48 

48 — 119 3.8 647 
43 3.55 15.4 4.6 16.0 43 : 
5.7 174 48 164 5.8 1.13 
6.2 4.538 15.4 6.5 15.2 7.3 : 
4.9 3.25 12.7 4.6 13.6 4.3 1.22 

Low High | 

Moe Moe F | 
159 38 171 3.8 17.2 39 1.57 

13.3 34 120 39 11.1 3.9 3.71" 3 

129 41 12.1 48 104 44 3.38" 

144 3.1 14.1 3.7 14.7 34 

145 4.0 143 3.9 145 4.0 

10.7 3.3 99 49 9.6 4.7 

138 «4.8 143 3.9 14.2 3.5 -- 

148 5.5 15.8 54 15.8 5.0 

14.0 44 16.6 4.2 17.6 38 8.11° 
15.1 4.1 13.6 5.1 11.6 3.8 6.41» 
13.4 4.7 13.5 4.7 1.8 3.5 1.84 

15.3 5.4 4.2 15.0 3.9 

174 48 5.1 16.7 6.1 

13.6 5.5 6.6 17.4 59 4.49 

i140 44 44 13.2 44 1.80 


variance using a 2 X 3 factorial design was applied 
to each of the 15 scales in each of the two samples. 


Results 


Table 2 presents means and standard devia- 
tions for the groups of over- and underachiev- 
ers in each sample, as well as the F ratios for 
the differences between groups. 

For the aptitude-based sample, the hypothe- 
sis of no difference between groups of over- 
and underachievers was rejected for five 
scales. Overachievers in a college of engineer- 
ing scored significantly higher on the Achieve- 
ment, Order, and Endurance scales, and 
significantly lower on Affiliation and Hetero- 
sexuality. 

For the performance-based group, the null 
hypothesis was accepted for all scales except 
Achievement. 

Table 3 presents the means and standard 
deviations for groups at each level of pre- 
dicted achievement for the aptitude-based 
sample. In order to conserve space, the 
equivalent data for the performance-based 
group is not presented. It is true of the latter 
sample that the null hypothesis may be ac- 
cepted for all scales. 

In Table 3, correlations between level of 
predicted ability and five of the EPPS scales 
are observable for the aptitude-based sample. 
In addition to the tabled results, significant 
interactions (between groups and levels) were 
found for the Deference, Succorance, and 
Endurance scales in the aptitude-based group. 


Discussion 


One objective of the study was to replicate 
the Gebhart-Hoyt study. The results for the 
aptitude-based sample provide the relevant 
data for comparison. The agreement between 
the two studies is considerable. One measure 
of the agreement is the correlation between 
the Kansas State and Carnegie Tech engineer- 
ing samples. The 30 pairs of means (two 
achievement groups and 15 scales) correlate 
.772. Furthermore, the hypothesis of Geb- 


5In the Gebhart-Hoyt study, means and standard 
deviations were presented for two colleges combined. 
Hoyt kindly provided the writer with the data for 
their engineering sample alone, thus enabling the 
comparison which is reported here. 
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hart and Hoyt that there are several patterns 
of over- and underachievement is clearly sup- 
ported. In the present study, we may char- 
acterize the overachieving engineer as one 
having a strong need to achieve, or a need 
to keep things orderly, or a need to endure 
in a task, and these traits are not correlated 
within the sample of overachievers. The ob- 
tained correlations of —.05, .17, and .16 
clearly indicate that different individuals are 
contributing the high scores to the different 
scales. The same situation prevails in regard 
to the Affiliation and Heterosexuality scales 
on which underachievers earn high scores. 
The correlation between the scales is —.14 
within the sample of underachievers. 

The second purpose of the study was to 
contrast two bases for determining the ex- 
pected performance of the college freshman. 
Despite the fact that a majority of the Ss 
are in both groups, the results seem quite 
clear. If we base our estimate on measures 
of aptitude, the EPPS makes a significant 
contribution toward reducing the residual 
variance (over- and underachievement). If 
we base our estimate on records of past per- 
formance, this contribution tends to wash out. 
It might be argued that the results simply 
reflect the fact that the performance-based 
estimate is the more valid and hence less 
reliable variance is available for the person- 
ality scales to explain. While the fact of a 
less reliable residual is indisputable, the mat- 
ter is of little concern. What seems impor- 
tant is the demonstration that two measures 
are functionally equivalent. The variance 
which is explained by past performance, but 
not by ability, may also be explained by cer- 
tain scales of the EPPS. These scales reflect 
a difference between capacity and perform- 
ance which is useful for predictive purposes, 
but which is also available from records of 
past performance. Since a theory of over- 
or underachievement is interested in account- 
ing for behavior which ability measures do 
not predict, the findings of Gebhart and Hoyt 
replicated in the present study seem of real 
value. 

If one is interested only in the problem of 
prediction, the past performance measures will 
often be preferred to a personality inventory 
on grounds of practicality. 
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Summary and Conclusions 


The major objectives of the study were 
(a) to investigate the relationship between 
EPPS scale scores and over- and under- 
achievement in a college of engineering and 
(b) to examine two bases (aptitude tests vs. 
achievement records) for determining an ex- 
pected level of performance. 

Two samples, each consisting of 120 Ss were 
selected; the samples were termed aptitude- 
based and performance-based. Seventy-three 
Ss were in both samples. In each sample 
there were 20 Ss at each of three levels of 
expected performance for both over- and 
underachievement groups. In regard to the 
first stated objective, analyses of variance of 
the aptitude-based sample permit the follow- 
ing conclusions: 


1. Overachievers scored significantly higher 
on the Achievement, Order, and Endurance 
scales, and significantly lower on Affiliation 
and Heterosexuality. 

2. These scales are statistically independent 
within the relevant samples, indicating that 
several patterns of over- and underachieve- 
ment are present. 

3. High ability Ss score significantly higher 
than low ability Ss on Dominance and Hetero- 
sexuality and significantly lower on Deference, 
Order and Abasement. 

4. Significant interactions between ability 
level and over- and underachievement were 
present for the Deference, Succorance, and 
Endurance scales. 


In regard to the second objective, the sepa- 
rate analyses of performance-based and apti- 
tude-based groups lead to the following 
conclusions: 


1. When over- and underachievement are 
taken as departure from a regression line 
based on achievement tests and high school 
record, only the Achievement scale of the 
EPPS discriminates between the two groups. 
In addition, the correlations between scale 
score and ability level disappear, and there 
are no significant ability-achievement inter- 
actions. 

2. The variance which EPPS scores account 
for is the same variance that is explained by 
an S’s past record of ability-performance 
differential. 

3. Theories of over- and underachievement 
may start with the personality description of 
Ss who deviate from an aptitude-based regres- 
sion line. Certain of the EPPS scales provide 
labels descriptive of this behavior. 

4. For purposes of selection, the EPPS and 
certain evidences of past performance are 
functionally equivalent. 


Received July 25, 1958. 
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Industries often exercise sufficient control 
in disposing of their wastes so that no noxious 
or toxic substances are added to natural 
streams. However, even when such care is 
taken, natural waters may be discolored by 
industrial wastes. The discoloration is offen- 
sive to individuals with riparian rights on 
the waterways and also to those who wish to 
make recreational uses of the waterways. In- 
dustries which discharge colorants may incur 
the ill-feelings or the lawsuits of people in 
contact with the streams. 

Some regulatory agencies have attempted 
to establish standards with regard to color- 
ants in streams. These standards are fre- 
quently arbitrary and difficult for an industry 
to meet. For example, the New England 
Interstate Water Pollution Control Commis- 
sion stated that the maximum colorant to be 
permitted in streams “will be amounts that 
are not objectionable” (Feller & Newman, 
1951, p. 3). The term “objectionable” is 
inappropriate as a standard since it is not 
specified in terms of objectionable to whom 
under what conditions, nor is it related to 
any quantitatively measurable scale. 

The study reported below was undertaken 
to determine a method for giving quantitative 
meaning to the qualitative term “objection- 
able” as it applies to colored wastes in 
streams. 

To establish a method for quantifying the 
term “objectionable,” there were three condi- 
tions to be fulfilled. One, there had to be a 
way of measuring color; two, there had to 
be a way of relating different colors to a 
measurable scale; and three, there had to be 
a device for simulating a natural stream so 
that qualitative judgments of objectionable 


1 This research, directed by Harold M. Corter and 
Nelson L. Nemerow, was sponsored by the National 
Institute of Health. 

2Now at Michigan State University. 


QUANTIFICATION OF THE TERM “OBJECTIONABLE” 
AS APPLIED TO COLORANTS IN NATURAL 
WATERWAYS * 


JOHN H. WAKELEY 2 
North Carolina State College 


colors could be obtained under laboratory 
conditions. 

A Photovolt Photoelectric Reflection Meter, 
Model 610 with a 610-D search unit, modified 
by Coss (Coss & Nemerow, 1958) for use in 
measuring the color of liquids, was employed 
to measure colors. Hunter’s (1942) tri- 
stimulus color filters were used in the instru- 
ment; and colors were defined in terms of 
dominant wave length, luminance, and purity 
in accordance with the procedure adopted by 
the International Commission on I!lumina- 
tion (Judd, 1933). Using this system of 
measuring color, any color can be given a 
location in color space. 

The method for relating different colors 
employed the Hunter-Scofield color-difference 
formula (Hunter, 1942). This formula pro- 
vides a means for measuring the distance be- 
tween two colors in color space in terms of 
the National Bureau of Standards’ unit of 
color-difference (Hunter, 1942, p. 519). The 
formula was experimentally derived and is 
expressly intended for use with the Hunter 
tristimulus color filters. A complete descrip- 
tion of the formula and examples of its use 
can be found in Hunter (1942) and Wakeley 
(1958, p. 48). 

Qualitative judgments were obtained by us- 
ing the Streamviewer to present the water 
from a natural stream (the color of which had 
been measured) to an S. The Streamviewer 
was essentially a box constructed of one-half 
inch plywood with dimensions of 24 inches 
from front to back, 25 inches from side to 
side, and 20 inches from top to bottom. The 
box had a viewing slot which permitted one 
person at a time to view the interior. A trough 
was fitted to the bottom of the box at the 
rear and in line of sight for a person looking 
through the slot. A circulating pump was 
fitted to the trough to maintain a flow of 
water in the trough. The interior of the box 
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was designed to represent a rural, summer 
scene from the Drowning Creek region near 
Hoffman, North Carolina. The rear wall of 
the interior was painted to represent the re- 
gion mentioned. The bottom was painted to 
represent grass. A daylight-type fluorescent 
lamp was used to light the interior. When 


appropriately colored water was put into the 
trough, the apparatus in operation simulated 
a natural stream in its natural setting. A com- 
plete description of the Streamviewer may be 
found in Wakeley (1958, p. 52). 


Procedure 


The procedure for obtaining qualitative judgments 
consisted of having an S look into the Streamviewer 
and observe while the natural color of the water 
was gradually changed by the addition of a colorant. 
When an S objected, a sample of the objectionable 
water was removed for color measurement. The 
trough of the Streamviewer was again filled with 
the natural water, and the S again observed while 
a different colorant was added until he objected. 
This process was repeated with each of six colorants. 
Twenty Ss followed the same procedure. 

Water used in the trough of the Streamviewer 
was matched to a sample of water taken from 
Drowning Creek near Hoffman, North Carolina, 
during a period of average summer flow and average 
color conditions. Records of the Geological Survey, 
United States Department of the Interior, Raleigh, 
North Carolina, were used to establish the average 
values mentioned above. 

Twenty North Carolina State College male stu- 
dents with normal color vision were used as Ss. 
Subjects were residents of Hoke, Richmond, or Scot- 
land Counties, North Carolina, since these counties 
were near the site where the stream was sampled. 
This sample was used because it represented a group 
which has contact with the stream under considera- 
tion, because it was readily available, and because 
it represented a group of superior ability in color 
discrimination. A study by Smith (1943) concluded 
that after the age of 14 years, males evidence better 
color discrimination than females, that individuals 
above the educational mean for their age group are 
superior in color discrimination, and that color dis- 
crimination ability increases to a maximum at about 
25 years of age and then declines slowly. 

Six textile dyes—a red, an orange, a yellow, a 
green, a blue, and a violet—were used as the 
colorants. 


Results and Discussion 


The data collected consisted of the 120 
colors found objectionable, ie., each of 20 
Ss objected to a certain color resulting from 
the addition of each of six different colorants. 


John H. Wakeley 


The colors found objectionable were compared 
to the color of the natural stream by means 
of the Hunter-Scofield color-difference for- 
mula. The twenty color-difference scores for 
each colorant added were tested for skewness 
and kurtosis, and it was determined that no 
distribution of color-difference values was 
either significantly kurtotic or significantly 
skewed. 

Following the tests for skewness and kurto- 
sis, the normal distribution curve was adopted 
as the model for obtaining a color-difference 
value for each colorant which was objection- 
able to fewer than 5% of the population rep- 
resented by the sample. This color-difference 
value, subsequently called the 5% OP (Ob- 
jectionable Point) score, was computed by 
the formula 5% OP=mean score — 1.65 
standard deviation. Table 1 shows the 5% 
OP scores obtained. Figure 1 presents the 
relationship of the 5% OP scores to the 
original color of the stream. 

Figure 1 shows the area of color change 
about the original color of the stream that is 
objectionable to less than 5% of the popula- 
tion represented by the sample used and also 
the area objectionable to 50% of the same 
population. The points on each of the six 
lines radiating from the center (original color 
of the stream) were color-difference values 
obtained in this study. The lines which con- 
nect the points and which define the two 
areas are estimates of the values which would 
be obtained if different colorants were used. 

If a color standard were to be set for 
Drowning Creek from the results of this 
study, the standard might be as follows. An 


Table 1 


Color-Difference Values Obtained for Each 
Distribution of Colorants 


Mean 
Colorant 


Red 
Orange 
Yellow 
Green 
Blue 
Violet 


23.40 
16.68 
34.37 
26.37 
31.32 
26.47 


Note.—All values are in National Bureau of Standards’ units 
of color difference. 


| 
| 
14.34 52.95 | 
61.22 88.74 
40.09 83.60 3 
80.56 132.24 
is 31.83 75.50 : 
3 


Fic. 1. Relationship of 5% OP scores and mean 
scores to the original color of the sample stream. 


industry which discharges colorants into 
Drowning Creek must exercise sufficient treat- 
ment of waste and care in discharge policy 
so that a color difference between the stand- 
ard color of the stream (as determined by 
a suitable agency) and the color of the stream 
after an effluent is introduced is never greater 
than the values which make up the 5% OP 
boundary about the standard color. 

The values obtained in this study apply 
only to the population sampled. To employ 
this method for setting standards for color 
pollution, it would be necessary to draw a 
sample of observers from the population which 
had contact with the certain stream for which 
a standard was desired. 

The method of this study, namely, having 
individuals in contact with the particular 
stream give judgments as to when the color 
of a stream becomes objectionable, quantify- 
ing judgments in terms of a color-difference 
score, relating the quantified judgments to 
the normal curve, and determining certain 
percentages of the population which object 
to a certain color change, is limited. With 
this method each stream presents a unique 
problem. Further investigation directed to- 
ward making the method applicable to areas 
larger than a specific stream is needed. 
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Investigations directed toward making the 
method more widely applicable should con- 
sider the following questions. What is the 
relationship among variables such as original 
color of stream, frame of reference of the 
viewers (sportsmen, tourists, farmers, etc.), 
and hue of colorant? How do the variables 
mentioned above influence the size and shape 
of the 5% OP area as shown in Fig. 1? Is 
there a difference in tolerance between those 
who perceive the colorants as esthetically un- 
pleasing and those who perceive the colorants 
as interfering with the usefulness of the 
stream? What relation exists between judg- 
ments using the Streamviewer and judgments 
under natural conditions? * 


Summary and Conclusions 


A study was conducted to provide a basis 
for determining a method for relating judg- 
ments of the objectionableness of colorants 
in natural waters to a measurable scale of 
color difference. 

Twenty Ss observed a simulated natural 
stream as it was gradually changed in color 
by the addition of each of six different color- 
ants. Every S indicated when the color of 
the stream Became objectionable for each of 
the colorants. A color-difference formula was 
used to determine how greatly the objection- 
able colors differed from the original color of 
the stream. The distribution of scores for 
each of the colorants was examined and found 
to be distributed in an essentially normal 
manner. Using the normal curve, 5% OP 
scores were determined for each of the sepa- 
rate colorants. A 5% OP score represented 
a color-difference between the original stream 
color and the color, resulting from the addi- 
tion of a certain colorant, which was objec- 
tionable to fewer than 5% of the population 
represented by the sample used. 

The major conclusion of this study was 
that the term “objectionable” as it applies to 
colored wastes in streams can be quantified. 
This conclusion was based on two specific 
findings: 


1. The point at which the color of a stream 


becomes objectionable as a result of the addi- 


8 This question is currently being investigated by 
the Department of Civil Engineering at North Caro- 
lina State College. 
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tion of a colorant was expressed in terms of 
a number. This number represented the dif- 


ference between the original color of the 
stream and the certain color objected to. 

2. The color-difference scores for the addi- 
tion of any certain colorant were found to be 
distributed normally in the sample used in 
this study. 


An inference was made that the normal 
curve can be employed to determine color 
differences which will be objectionable to cer- 
tain percentages of a population which is in 
contact with a particular stream. 


Received July 28, 1958. 
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A STUDY OF ENGINEERS’ CRITERIA FOR CREATIVITY ''* 


THOMAS B. SPRECHER 2 


It is a common belief that the creative 
person is vital to our national welfare and 
to the well-being of individual industries. 
Recent years have seen a considerable increase 
in the number of studies concerned with the 
measurement of creativity in engineers and 
scientists. Not at all uniquely, criterion de- 
velopment is a crucial problem, although fre- 
quently it is assumed that creativity is a 
concept that can be defined, given to a set 
of judges, and used by them. To the author’s 
knowledge, no previously reported study has 
proceeded by allowing judges to read into 
the term creativity whatever meaning they 
chose and then attempting to specify this 
meaning. This study asserts the premise that 
differences among expert judges regarding the 
meaning of the term creativity is meaningful 
variability indicative of the kinds of ideas 
connoted. If creativity means different things 
to different people, this variability should be 
explored before attempts to define it are 
undertaken. 

In order to describe the creative person in 
a technical setting, engineers in a large indus- 
trial organization producing aircraft equip- 
ment were questioned. This study reports on 
those things engineers and supervisors men- 
tioned when asked to tell why they felt that 
some engineers and some solutions to engi- 
neering problems were more creative than 
others. Estimates of the predictability of 
various criteria used in the study will also be 
reported. Although exception may be taken 
to the use of engineers to describe creative 
people per se, these men were well acquainted 
with each other and the requirements of their 
jobs demanded new ideas to meet new and 


1 Submitted to the graduate school of the Univer- 
sity of Maryland in partial fulfillment of the require- 
ments for the Ph.D. degree. The author is very 
grateful for the assistance of the following indi- 
viduals: John W. Gustad, University Counseling 
Center, University of Maryland, College Park, Mary- 
land; and Ray C. Hackman, Psychological Service 
of Pittsburgh. 

2 Presently Consulting Psychologist at Psychologi- 
cal Service of Pittsburgh. 
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complex situations. Whether or not we grant 
the ability of engineers concerned with the 
production of new equipment to describe the 
creative person, we will report here what such 
engineers think creativity means. Important 
practical considerations require that their 
opinions be considered carefully. 


Method 


Sampling. One hundred and seven engineers were 
drawn at random from a population of work groups. 
The basic decision affecting the sampling plan was 
the decision to attempt to get ratings on each man 
in the study from his peers as well as from his su- 
pervisor(s). The sampling plan required that the 
men in these work groups be sufficiently well ac- 
quainted with each other so that the men could rate 
each other. A further restriction was that there be 
at least 15 men in each of the groups so that 12 men 
could be available without being restricted by unex- 
pected travel, sickness, etc. The Ss were males, and 
without exception were performing engineering work 
at the time of the study. All except one had as a 
minimum either a B.S. degree in engineering or its 
equivalent. 

After contact with all of the engineering depart- 
ments and the supervisors, a population of depart- 
ments consisting of at least four departments within 
each of the three areas of service, project, and re- 
search was developed. These three areas had been 
previously selected as representing three broad types 
of functions performed by the various departments 
within the company and which might serve as bases 
for differential assignment. Advance judgment in 
the company assumed this difference among work 
areas was important. Random sampling was used 
to select three departments out of the four avail- 
able within each of the areas, and within each de- 
partment random sampling was used to call in 12 
men from those available. The sample finally con- 
sisted of 36 men from research groups, 36 from 
service groups, and 35 from project groups. 

Procedure. The testing itself was conducted in 
one session and lasted approximately two hours. The 
men were given a variety of paper and pencil tests 
measuring various factorially defined abilities. They 
were also asked to solve three brief open-end engi- 
neering problems. These tests and the engineering 
problems will be described later. Immediately fol- 
lowing the administration of these psychological and 
engineering tests, each man was asked to rank in 
terms of creativity the other 11 men from his sec- 
tion who were also participating in the study. No 
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definition of creativity was given to him. Ties in 
the ranking procedure were allowed, and some of 
the men partially or completely refused to complete 
the rankings. The number refusing to participate 
varied considerably from department to department, 
but did definitely reduce the number of judges avail- 
able. After most had filled out the ranking form, 
they were asked to give reasons why they had 
chosen the top two men as more creative than the 
bottom two men. They were asked to make their 
comments specific and to give recent incidents sup- 
porting them. 

A ranking form similar to the foregoing was given 
to the supervisors of each of the nine groups par- 
ticipating in the study, and each supervisor was 
asked to rank the 12 men from his work area with 
regard to their creativity. Again, no definition of 
creativity was given. The supervisors also were 
asked to give reasons for the differences between the 
high ranked and low ranked men. 

A similar procedure involving both peers and su- 
pervisors was used in judging the creativity of the 
answers to the engineering problems. The ranking 
procedure will be described in detail later, but after 
ranking sets of such answers, each man compared 
the sets of answers located at the two extremes and 
gave specific reasons why they differed in their 
creativity. 

Predictors and criteria. The following are the pa- 
per and pencil tests previously referred to which in 
general were used as predictors of the ratings of 
the men: 


I. Scores on brief ability tests: 


A. Plot Titles—which yielded two scores, one for 
Originality based on the number of plot titles 
written about a brief story which were judged 
to be clever, and the other score being one 
for Idea Fluency, based on the number of 
titles written which were judged not to be 
clever. 

. Synonyms—the number of synonyms given to 
common words, giving a score on Associative 
Fluency. 

. Sign Changes—the number of simple arith- 
metic operations carried out successfully when 
the ordinary meanings of the elementary sym- 
bols of arithmetic changed from section to 
section of the test and producing a score on 
Adaptive Flexibility. 

. Vocabulary—a high level multiple choice gen- 
eral vocabulary test that represented the fac- 
tor of Verbal Comprehension. 


II. Scores on the engineering problems: 
A. Shop problem 
B. Mass Flow problem 
C. Mechanization problem 


With the exception of the vocabulary test, the abil- 
ity tests were adapted from those devised by J. P. 


3 The correlation between these two scores was 
not significant at the .05 level. 
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Guilford (Guilford, Wilson, Christensen, & Lewis, 
1951; Guilford, Wilson, & Christensen, 1952) in his 
studies of higher level aptitudes. Since his tests had 
been developed for the Air Force, it was not possible 
to use the identical tests, but versions closely similar 
to them were developed. 

The engineering problems were problems techni- 
cally relevant to the work performed at the com- 
pany and were constructed by supervisors. Ten such 
problems were developed, and three were used in 
this study. These three were those which success- 
fully passed a screening procedure in which 10 en- 
gineers worked these problems under conditions simu- 
lating those to be used in the study. These condi- 
tions involved 15-minute time limits, individual work, 
the problem stated as an open-end question, and also 
included some consideration of the variety of an- 
swers obtained from each man as well as the variety 
of answers from the group as a whole. 

The six criteria for creativity used are presented in 
Table 1. The last two criteria have already been 
described. 

To score the answers to the engineering problems 
a ranking procedure was used which involved rank- 
ing the whole set of answers given by each man to 
each problem. All the answers that each man gave 
to each problem were typed on a separate page with 
five carbon copies, and each set of answers was 
ranked by five different engineers. All the rankings 
used in this study were transformed by assuming 
that the ranking corresponded to a normal distribu- 
tion in which the mid-point of each of 10 cells (for 
example) was assumed to represent a point which 
was the center of 10% of the total area in a normal 
distribution. The corresponding Z score for this 
point was determined, multiplied by 10, and then 
rounded to the nearest whole number. The lowest 
negative number was called one and the others were 
formed by adding the numerical distance of each 
successive number from its next lowest neighbor. 
This resulted in a scale in which the highest number 
was associated with the best performance. 

For each man in the study, the number of patent 
disclosures submitted through company channels in 
the past year was available. However, of the 90 
men in the study who had been with the company 
at least a year and so had an opportunity to de- 


Table 1 


Criteria for Creativity 


Performance Criteria 


Rankings of answers to Shop problem 

Rankings of answers to Mass Flow problem 

Rankings of answers to Mechanization problem 
. Number of patent disclosures in past year 


Subjective Criteria 
Rankings of men by supervisors 
Rankings of men by peers 
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velop a patent disclosure, only 24 had had one or 
more patent disclosures. Consequently, each man 
was classified as having either none or at least one 
patent disclosure. 

Reliability of the rating procedures. Inter-rater 
agreement in judging the answers to the engineering 
problems involved the correlation between two sets 
of two engineers. These two sets of engineers were 
randomly formed from among the engineers who 
had rated the answers to the engineering problems. 
The correlation of these raters within each of the 
three engineering problems was boosted as follows 
by a factor of 2.5 using the Spearman-Brown cor- 
rection: for the shop problem, from .69 to .85; for 
the Mass Flow problem, from .52 to .73; and for 
the Mechanization problem, from .70 to .85. 

The data also permitted estimates of the extent of 
agreement among different engineers on which men 
are creative. An estimate of the agreement of one 
randomly selected peer with another similarly se- 
lected peer in rating men gave a correlation of .55, 
but this figure was boosted to a substantially higher 
figure of .87 by using the Spearman-Brown prophecy 
formula since more than five peers rated each indi- 
vidual. Slightly dissimilar conditions hold for the 
estimated reliability of supervisors’ ratings of men. 
In some cases only one supervisor was able to rate 
the men in the department. Consequently, the cor- 
relation here applies only to those departments where 
at least two or more supervisors had provided ratings 
of the men as men. This correlation was .84, boosted 
from .66. The agreement of supervisors with peers 
in rating the men was calculated by using the aver- 
age supervisor rating of each man, and the average 
peer rating on each man. The correlation obtained 
here was .64. This correlation, recalculated using 
only those departments where at least two super- 
visors had rated men, was .73. 

Test-retest estimates of reliability were obtained 
for the supervisors, although they were not obtained 
for the peers. These correlations, transformed by 
Fisher’s Z, showed an average test-retest correlation 
of .87. 

Development of content analysis categories. The 
sheets on which both supervisors and peers had de- 
scribed creative men were formed into seven sets 
with approximately 12 to 15 sheets in each set. Each 
of these randomly formed sets was given to a pair 
of graduate students in psychology. Each member 
of the pair worked independently, and each read 
over the reasons and built a classification system, in 
whatever detail he chose, which represented all the 
ideas that were contained in the responses. These 
categories were examined by the author and com- 
bined into one master list comprising 55 categories. 
These categories can be condensed into the following 
areas: 


Content analysis categories for creativity of men 


I. Interpersonal relations 
II. Job satisfaction 
IIT. Personal background and general personal char- 
acteristics 
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IV. Problem related behavior 

. Problem type preferred and/or able to handle 
. Approach used 

. Manner of use of approach 

. Results achieved 

Report of results 


> 


An identical procedure was followed in forming 
categories for reasons why sets of answers were rated 
as creative, except that in this case the master list 
was put in a form similar to that developed for the 
analysis of reasons why men were judged creative. 

The investigator then classified all the actual rea- 
sons for the ratings of the men given by each of the 
supervisors and peers, and a rescoring of part of 
these data after an interval of one week yielded 
80% agreement as to the category to which it be- 
longed. A similar reliability check on the classifica- 
tion of the reasons why certain answers were judged 
to be creative yielded 82% agreement. 

In forming the classification system the investi- 
gator had assumed that good technical knowledge, 
for instance, characterized the creative person rather 


Table 2 


Rank Order of First Ten Variables in Reasons 
Why Men Are Creative 


Rank Frequency 
Order of Mention Description of Variable 


1 30 Is independent of others vs. needs 
guidance in problem solving. 
2 27 Produces novel and unconven- 
tional solutions vs. routine ones. 
3 23 Produces some solutions or one vs. 
no solutions. 
4 22 Prefers new and difficult problems 
- vs. prefers routine problems. 
5 17 Produces practical and valuable 
solutions vs. impractical solutions. 
6 16 Analyzes a situation vs. doesn’t 
or can’t analyze. 
7 16 Shows energy and alertness vs. 
lacks energy. 
8 15 Produces many solutions or ideas 
‘ vs. few solutions. 
9 14 Has a high degree of technical 


knowledge and academic achieve- 
ment vs. a low degree. 


10 13 Organizes and plans ahead vs. is 
unorganized and overlooks details. 


Note.—Combining supervisors and peers, total N was 98. 
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than poor technical knowledge, and such assumptions 
needed to be verified. Consequently, in the content 
analysis procedures, information was recorded with 
respect to which end of the continuum was actually 
attributed to the creative and which to the uncrea- 
tive person. Nine reversals occurred in judging the 
creativity of men, and most of these reversals took 
such forms as, “This uncreative person has a high 
degree of technical knowledge, but... .” Only one 
reversal occurred in judging the creativeness or un- 
creativeness of sets of answers to the engineering 
problems. 


Results 


For the content analysis data, tests of sig- 
nificance, including chi-square analyses and 
Fisher’s exact probability test (Siegel, 1956), 
were run to determine if there were differ- 
ences between the supervisors and peers in 
the variables stressed. The data on which 
these analyses were made are not presented 
here, but the over-all conclusion was that 
there were no significant differences between 
supervisors and peers in the factors they em- 
phasize when they rate engineers as to their 
creativity. 

A similar procedure in analyzing the dif- 
ferences between supervisors and peers in 
their reasons why answers were judged to be 
creative was carried out. The over-all con- 
clusion for these data was the same as that 
for the data for the creativity of men: there 
were no significant differences between super- 
visors and peers in the meaning they attach to 
‘creativity of answers. 

In order to highlight the most important 
outcomes of the content analysis procedures, 
Tables 2 and 3 present, respectively, the most 
frequently used categories in judging the crea- 
tivity of men and in judging the creativity of 
answers to engineering problems. The drop- 
off in frequency of mention of the remaining 
categories was so slow that an arbitrary cut- 
ting point was established for each of the 
tables. 

In addition to the content analysis results, 
other information relating to the psychologi- 
cal and engineering tests used in the study 
should be reported next. As previously men- 
tioned, one third of the men in the study were 
from service areas, one third from research, 
and one third from project areas. An analy- 
sis of variance design revealed the differences 
among these areas in their performance on 
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Table 3 


Rank Order of First Nine Variables in Reasons 
Why Answers Are Creative 


Rank Frequency 
Order of Mention 


1 34 


Description of Variable 


Produces comprehensive and gen- 
eralizable solutions vs. sketchy 
solutions. 


Presents correct and appropriate 
solutions vs. incorrect ones. 


Produces many solutions or ideas 
vs. few solutions. 


Produces novel and unconven- 
tional solutions vs. routine ones. 


Produces some solutions or one vs. 
no solutions. 


Uses flexible and varied approaches 
vs. inflexible and narrow ones. 


Produces practical and valuable 
solutions vs. impractical solutions. 


Doesn’t blame others vs. does 
blame others and antagonizes 
them. 


Is interested in solution of prob- 
lems vs. avoids problems. 


Note.—Combining supervisors and peers, total N was 53. 


the engineering tests to be significant at the 
O01 level. Individual ¢ tests between each 
possible pair among service, project, and re- 
search areas were also significant at the .01 
level. In the interests of anonymity, no in- 
dividual results on areas will be reported. It 
is of interest to note that the problems were 
originally devised by supervisors in only one 
of the areas. It was expected that this bias 
might later affect results so that this area 
would perform the best on these problems. 
However, the area for which these problems 
were developed by their supervisors was not 
the best in its performance on them. 

The agreement of ratings of men as crea- 
tive men with ratings of performance could 
also be studied. In general, the level of agree- 
ment is low. Table 4, among other data, re- 
lates peer and supervisor ratings of men to 
the ratings of performance of the same men 
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Intercorrelations of the Criterion Measures 


Performance Criteria Subjective Criteria 


Mass 


Flow Mech. Patent Sup. Peer 
Prob. Prob. Disc. Ratings Ratings 
Performance Criteria 
Shop prob. .18* a .09 ll 07 
Mass Flow prob. 14 .06 


Patent disc. 


Subjective Criteria 


Sup. rating .64** 
Peer rating 


Note.—N varies from 90 to 107. Those correlations involving patent disclosures are point biserial r’s. All the remainder 
are Pearson product moment correlations. 
* Statisticaily significant at the 5% level. 
** Statistically significant at the 1% level. 


on the engineering problems. It shows that dicate that they are not measuring the same 
out of six possible relationships, only two are _ thing. 
significantly related. One other important question relates to the 
The agreement of ratings of performance predictability of the various criteria used in 
with each other on the various problems was_ the study. These results are summarized in 
also obtainable from the data. There were Tables 5 and 6. In general, it seems that 
three possible relationships among these three _ performance criteria are the most predictable. 
problems, and these are also presented in Of the engineering problems developed spe- 
Table 4. It seems obvious that while’ there cifically for this study, the mechanization 
is some relationship among these engineering problem is the most predictable. Four of its 
problems, correlations of the order of .2 in- multiple R’s were significant at the .01 level, 


Table 5 


Multiple Correlations of Various Combinations of Predictors with Criteria 


Engineering Problems as Criteria* 


Ratings of Men as Criteria» 


Multiple 
Multiple Multiple Rw/ 
Combina- Rw/ Rw/ Mechani- Combina- Multiple Multiple 
tions of Shop Mass Flow zation tions of Rw/ Rw/ 
Predictors Problem Problem Problem Predictors Supervisors Peers 
12345 18 12345678 37 40* 
1234 1234567 35 a 
123 16 .29* a 123456 34 a 
12 16 .26* a 12345 .34* 


1234 .34* 
123 a 
12 .26* .29* 


* Predictors are as follows: 1. originality; 2. idea fluency; 3. adaptive flexibility; 4. associative fluency; and 5. vocabulary. 
> Predictors are as follows: 1. mechanization problem; 2. idea fluency; 3. adaptive flexibility; 4. associative fluency; 5. 
vocabulary; 6. shop problem; 7. mass flow problem; and 8. originality. 
* Statistically significant at the 5% level. 
** Statistically significant at the 1% level. 
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Table 6 


Efficiency of Cutting Scores Based on a Discriminant 
Function in Identifying Men With 
Patent Disclosures 


Of Those 
Achieving 
a Cut-off 
Score of 


No Patent 

Disclosure 

Was Held 
by 


Patent 
Disclosure(s) 
Were Held 
by 


Total 
N Was 
0 
11 
25 


43 
Od 
66 


the highest being a multiple R of .52 with a 
combination of the scores on originality, idea 
fluency, adaptive flexibility, associative flu- 
ency, and vocabulary. However, no cross- 
validation procedures were carried out on any 
of the problems, and the practical significance 
of this result is indicative rather than actual, 
since it would shrink to limited usefulness 
with repetition of the study. 

Table 6, presenting the results of a dis- 
criminant analysis of the patent disclosure 
data, adds support to the belief that perform- 
ance criteria are potentially predictable al- 
though there is again no evidence that these 
results would stand up on cross-validation. 
Using an arbitrary cutting score of 2300, 
seven out of the seven men included would 
have been those with patent disclosures. With 
a cutting score of 2000 the chances are about 
50-50 that a man included in a group so se- 
lected would really produce one or more pat- 
ent disclosures. The discriminant function 
weights used in deriving this table were de- 
veloped from a multiple correlation analysis 
using point biserial r’s between the predictor 
variables and the dichotomized predictand, 
having or not having a patent disclosure in 
the past year. The multiple R so obtained 
was .42. 


Discussion 


A previous survey of the literature in the 
area of creativity had revealed two factors 
that were predominantly mentioned. The 
first was novelty of ideas. The second was 
the ability to produce valuable and practical 


solutions. Both of these factors appeared in 
this study in judging the creativity of men as 
well as in judging the creativity of answers. 
This lends some additional empirical confir- 
mation of the importance of these concepts. 

However, neither of these was at the top 
of either list. Also, other kinds of variables 
appeared. Work-habit variables were fre- 
quently placed in the top 10 in judging the 
creativity of men. The most striking of 
these is the variable mentioned most fre- 
quently, “independence of others.” In the 
opinion of the peers and supervisors in this 
study the creative person was characterized 
by an ability to proceed on his own. Other 
work-habit variables were such characteristics 
as a tendency to analyze a situation, marked 
energy, and an ability to organize and plan 
the details of a project. It is felt that these 
variables, representing seemingly workmanlike 
approaches to a new task, have been largely 
neglected or overlooked in previous studies of 
creativity and should be exploited in future 
work, particularly if we are interested in pre- 
dicting behavior judged to be creative by en- 
gineers. 

The results of the content analysis of the 
reasons why answers are creative show that 
the most frequently mentioned variable was 
that the answers be comprehensive and gen- 
eralizable. It should be remembered that en- 
gineers were judging these answers and one 
might assume that they have a liking for 
thorough, complete, and generalizable an- 
swers. These were technical answers to engi- 
neering problems judged by personnel trained 
in engineering. Novelty of solution was men- 
tioned by approximately half as many men 
as mentioned comprehensiveness of answers. 
This reversal of emphasis suggests that crea- 
tivity in this setting has a different meaning 
than that assigned to it in the general litera- 
ture. In any case, it is important that the 
meaning of creativity be determined for the 
specific situation where it is to be predicted. 
If these results are not specific to this one 
industrial setting, then future work in the 
field of creativity could profit from considera- 
tion of these potentially important variables. 

The occurrence of novelty of ideas as a 
category in judging both the creativity of an- 
swers and in judging the creativity of men 
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gives it special importance. However, the va- 
riety of ways in which novelty of ideas, or 
originality, may be measured is still largely 
unexplored. Insofar as the investigator knows, 
only Guilford (Guilford et al.: 1951, 1952) 
has done a factor-analytic study of a variety 
of ways of measuring originality. Barron 
(1955) presented some evidence on the inter- 
correlations of various measures of originality, 
but his work did not demand that the dimen- 
sionality of his tests or scoring procedures be 
determined. 

Since the achievement of practical and valu- 
able answers appears characteristic of both 
creative men and creative answers, it would 
seem worthwhile in future studies of crea- 
tivity to include a scoring procedure which 
rates answers as well as men on practicality. 
While it might be difficult to assign psycho- 
logical factors which would account for the 
production of practical ideas, an applied study 
might gain considerably by using such a scor- 
ing procedure. 

The variable ranked eighth in the list of 
reasons why answers are creative presumably 
appeared because one of the technical prob- 
lems involved some aspect of interpersonal re- 
lations. Some of the men judged to be un- 
creative were judged as such because they 
were antagonistic in their attitude towards 
others. 

Tests were of course also used for an in- 
trinsic interest in concurrent validity. The 
uninstructed judges used in the study did in 
general show the agreement among themselves 
as to who or what they rated as creative 
which is basic to concurrent validity. Several 
judges, possibly at least three and this study 
suggests five, are needed to obtain a fairly 
high level of reliability in terms of agreement 
on any one rating assigned to an individual. 
These results hold both for the supervisors 
and for peers. Also, these two groups by and 
large did agree on the rating they assigned to 
an individual, and the ratings seem to be 
stable, since the supervisors who were re- 
quested to reratetthese same men at a later 
date came up with essentially high test-retest 
correlations. Although by the nature of their 
supervisory responsibilities they would per- 
haps be expected to have this high reliability, 
it was an important verification. 
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The caliber of the personnel involved in 
different work areas within an organization 
may have an important bearing on personnel 
assignment or selection, if the results from 
the engineering performance tests have gener- 
ality. Clear and consistent differences be- 
tween work areas appeared on these problems 
in spite of the bias in favor of one of the 
areas due to their development by a particu- 
lar supervisory group. 

One other major conclusion emphasizes the 
extent of differences between performance rat- 
ings and the more subjective over-all evalua- 
tions of men. There seemed to be little 
agreement between such measures, since the 
relationships, while significant in some in- 
stances, were low (see Table 4). In part this 
may have been due to the low interrelation- 
ships among the engineering problems them- 
selves, as Table 4 also suggests that a defi- 
nitely larger number, possibly of the order 
of 10, is necessary to provide a minimum 
sampling of the variety of technical tasks 
possible. 

Importance is attached to the fact that the 
objective performance measures seemed to be 
the more predictable. Both patent disclosures 
and the engineering problems came closer to 
practical predictability than the more sub- 
jective measures used. However, it is likely 
that the ratings of the men as men could 
be better predicted by including work habit 
measures, since work habits, at least insofar 
as the verbalized reasons for the ratings were 
concerned, were related to the judgments. If 
this result holds up in further studies, impor- 
tant practical gains could be made in pre- 
dicting such a criterion by the use of such 
variables. 


Summary 


The meaning of creativity was investigated 
by asking engineers in a large industrial firm 
to give reasons why men they had ranked 
highest in creativity differed from those 
ranked lowest. These men also justified their 
rankings on creativity of answers to brief 
open-end engineering problems. No signifi- 
cant differences in the bases for such judg- 
ments were noted between engineers and 
supervisors of engineers. 

The content analysis results verified a wide- 
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spread impression in the literature that the 
novelty and worth of ideas are important 
factors in creativity. It also brought out 
other factors which had been largely over- 
looked in the literature, such as independence 
in problem solving and the achievement of 
comprehensive answers. 

The various criteria used indicate a fair 
level of agreement in these untrained raters 
as to which products or which people are 
creative, although there may be basic differ- 
ences in the meaning of the word creative 
according to its use as an over-all rating of 
a person or whether it is a rating of specific 
engineering performance. There also seems 


to be more practical predictability in the per- 
formance ratings than in the more subjective 
over-all ratings, granting two qualifications. 
First, since the ratings of the total impression 
of an engineer were affected by work-habit 
characteristics, inclusion of these variables as 
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predictors might increase the predictability of 
such a Criterion. Second, if performance rat- 
ings of engineering products are used, a fairly 
large sample of the various kinds of problems 
is required. 


Received July 30, 1958. 


References 


Barron, F. The disposition toward originality. J. 
abnorm. soc. Psychol., 1955, 51, 478-485. 

Guilford, J. P., Wilson, R. C., Christensen, P. R., & 
Lewis, D. J. A factor-analytic study of creative 
thinking: I. Hypotheses and description of tests. 
Los Angeles: Univer. of Southern California, 1951, 
Psychol. Lab. Rep., No. 3. 

Guilford, J. P., Wilson, R. C., & Christensen, P. R. 
A factor-analytic study of creative thinking: 
II. Administration of tests and analysis of results. 
Los Angeles: Univer. of Southern California, 1952, 
Psychol. Lab. Rep., No. 8. 

Siegel, S. Nonparametric statistics for the behavioral 
sciences. New York: McGraw-Hill, 1956. 


‘ 
a 
| 
3 
' 
i 


An event of importance for 
NEW AUTHORS ONLY 
OPUS #1 is the title of a new 
Philosophical Libra 1959 
book series which will pub- 
lish the deserving manu- 
scripts of new authors only. 
_ OPUS #1 has two basic re- 


quirements: The author must 
have something to say and 
know how to say it. 


OPUS #1 will be dedicated to 
the humanities and will in- 
clude works in the fields of 
Art, Literature, Philosophy, 
Religion, History, 
Psychology and Psychiatry. 


OPUS #1 books will be print- 

ed on special, fine antique 

paper, and artistically cloth- 
und for library use. 


Only complete book-length 
manuscripts accepted. 


Inquire for full details 
Library, Publishers 
Dept. OP-106 
15 East 40 St., N.Y. 16, N.Y. = 
= 


Schizophrenia 
Manfred Sakel 


Epilepsy 
Manfred Sakel 
The Analysis of Dreams 
Medard Boss 
Group Psychoanalysis 
B. Bohdan Wassell 
The Neuroses and ‘ 
their Treatment 
Edward Podolsky 10.00 
Psychotherapy and Society 
Wladimir Eliasberg 6.00 
= 
= 


The New Chemotherapy 
in Mental Illness 
H. L. Gordon 12.00 


Principles of Self-Damage 
Edmund Bergler 6.00 


The New Psychiatry 
Nathan Masor 3.75 


Experimental Psychology 
Ivan Pavlov 7.50 


Philosophical Library, Publishers 
= 15 East 40 Street, New York 16 


2 
i 
# 
‘ 
5.00 
= 
| 


MICROFORM 


All journals published by the American Psychological 
Association are now available on MICROFILM or 
MICROCARD. 


Psychological Review 
American Psychologist 
Psychological Bulletin 
Psychological Abstracts 
Contemporary Psychology 
Psychological Monographs 
Journal of Applied Psychology 
Journal of Consulting Psychology 
Journal of Educational Psychology 
Journal of Experimental Psychology 
Journal of Abnormal and Social Psychology 
Journal of Comparative and Physiological Psychology 


Available only in volume units; no single issues. 


For MICROFILM, order from: For MICROCARD, order from: 


UNIVERSITY MICROFILMS, INC. J. S. CANNER & COMPANY, INC. 
313 North First Street Microcard Division 

Ann Arbor 618 Parker Street 

Michigan Roxbury 20, Massachusetts 


| 
4 

‘ 
| 
| | | | 
— 


MeGRAW-HILL 
Books 


THE PSYCHOLOGY OF LEARNING 


New Second Edition. McGraw-Hill Series in Psy- 
ogy. 384 pages, 


the student a representative picture of the basic facts and theoretical po oy in the psychol , 
There is ae emphasis on experimental evidence. Theories of learning are treated in the context of 

ems, and the theoretical emphasis is upon the analysis of problems rather than upon differ- 
ences theoretical “schools.” 


PSYCHOLOGY: A STUDY OF A SCIENCE 
(A Seven Volume Inquiry) 


SIGMUND KOCH, Duke University, Editor and Study Director. 


Volume I, SENSORY, PERCEPTUAL, AND PHYSIOLOGICAL FORMULATIONS is now off the 

ph es, $9.75) Volume Ul, GENERAL SYSTEMATIC FORMULATIONS, LEARNING, AND PE- 
IAL PROCESSES and Volume III, FORMULATIONS OF THE PERSON AND THE SOCIAL CON- 

TEXT will be ready in the Spring. Over 80 distinguished authors have contributed analytic essays for 

the 7 volume inquiry. The first three volumes consist of the contributions of 36 eminent psychologists who 

in their writing illuminate the mob Ay py A forces, methods, and ideas that have dete ed the recent 

history of systematic at the same time have creatively extended that history. 


PSYCHOMETRIC METHODS 


By J. P. GUILFORD, University of Southern California. McGraw-Hill Series in Psychology. Second 
Edition. 604 pages, $8.75. 


Shemaeey revised and omganiet, the second edition of Psychometric Methods presents the same com- 
rehensive treatment of phases of psychological measurement that ed the first edition. 

ew material includes sections on the theory of psychological measurement, psy om poe theory, mathe- 
matics nec for an understanding of psychometric methods, new Poenphadis eth > 
of judgment and current major approaches to psychologicai-test theory. throughout the 
fundamental unity of all the measurement methods. 


HUMAN ENGINEERING 
By ERNEST J. McCORMICK, Occupational Research Center, Purdue University. 467 pages, $8.00. 


A nontechnical introductory survey book dealing with the design of equipment and the adaptation of work 
et by A we human use. It summarizes much of the work that has been done in Semen 


mage hy professions as physiology and medicine, with emphasis on the contributions of 
alee understanding of these functions is developed through discussion of human informa’ 
receiving, decision-making, and action processes. 


Send for copies on approval 


BOOK COMPANY, INC. 
330 West 42nd Street New York 36, N. Y. 


O 
; 
eck 
tion- 
: 
LS 


revised and renormed— 


The Henmon-Nelson Tests of 
Mental Ability 


Tom A. Lamke and Martin J. Nelson 


Range—Grade 3 through college level 


featuring— 
@ 30-minute working time permitting test to be administered in one class period 
(College level working time—40 minutes) 
@ one set of directions to be given at the start of the test 
@ correlations with criteria as high as with other longer tests 
@ low per-pupil cost 


The Nelson-Denny Reading Tests for 
High Schools and Colleges 


Measure—Vocabulary, Comprehension, and Rate 
Martin J. Nelson and E. C. Denny. Revised by James |. Brown 


Range—Grade 9 through graduate level 
featuring— 
@ convenient 30-minute working time 
@ revised format 
@ easy administration 
@ simple scoring procedure 
@ low per-pupil cost 


understanding is the key— 
Psychology in Industry, 2nd edition 


Norman R. F. Maier 


@ emphasizes the need to understand the causes of behavior, and to solve problems with people 

@ covers traditional subjects such as testing, placement, job analysis, and merit rating 

@ examines attitudes, frustration, motivation, fatigue, accidents, training, and turnover 

@ provides two new chapters: one on supervisory leadership, and one on counseling and 
interviewing 

@ includes a wealth of exercises which lend themselves to discussion procedures and role- 
playing techniques 


678 pages $6.00 
HOUGHTON MIFFLIN COMPANY 


i 


